The chatglm-llama-chinese-insturct from 27182812

评价数据集

大佬 GLM有没有在公开数据集上和其他LLM对比过？或者说是，有没有一个评价标注，怎么说他好他坏

微调之后，chatglm-lora.pt的大小6.4G，是不是有问题啊？

AttributeError: /root/anaconda3/envs/bab/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

AttributeError: /root/anaconda3/envs/bab/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats
啥问题。
A6000 显卡

我用T4，16gb显卡，训练3000step，部分参数如下
--max_source_length 64
--max_target_length 64
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 16
54mb大小的数据集，我发现居然要7个小时，请问3090或者v100会更快吗？应该多少的显卡可以保证训练速度可以跟上呢

训练需要多少G显存？

单卡3090 可以不可以训练，怎么支持多卡训练？

peft报错

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /root/miniconda3/envs/bab/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /root/miniconda3/envs/bab/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/libbitsandbytes_cuda118.so...
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|██████████████████████████████████████████| 8/8 [00:09<00:00,  1.23s/it]
Traceback (most recent call last):
  File "/root/autodl-tmp/ChatGLM-chinese-insturct/finetune.py", line 168, in <module>
    main()
  File "/root/autodl-tmp/ChatGLM-chinese-insturct/finetune.py", line 147, in main
    model = get_peft_model(model, peft_config)
  File "/root/miniconda3/envs/bab/lib/python3.10/site-packages/peft/mapping.py", line 142, in get_peft_model
    peft_config = _prepare_lora_config(peft_config, model_config)
  File "/root/miniconda3/envs/bab/lib/python3.10/site-packages/peft/mapping.py", line 117, in _prepare_lora_config
    raise ValueError("Please specify `target_modules` in `peft_config`")
ValueError: Please specify `target_modules` in `peft_config`

数据问题。

问下大佬，能给个数据连接吗

微调报错：RuntimeError: CUDA error: device-side assert triggered

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

用项目自带的数据微调，遇到这个报错，请教一下如何解决

如何制作训练样本zh-data01.json？

https://raw.githubusercontent.com/27182812/ChatGLM-LLaMA-chinese-insturct/main/data/zh-data01.json

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

扩充中文词表后微调

hello，很高兴你这么快又更新了基于扩充词表llama 的微调结果，我有个疑问：扩充词表的llama 我看他的训练细节也是基于lora进行的预训练，为什么我再合并了模型之后在模型的结构里没看到lora的参数层？你在这个基础上进行lora的微调是直接吧他的llama模型当成基础模型然后再初始化了一个lora结构进行微调的吗。看到的话能帮忙解答吗，感谢～

貌似这份代码在处理数据的时候把中文标点符号转化成了英文的？能帮忙指明下这部分代码在哪里吗？

直接用ChatGLM-6B效果还好些？

首先感谢您开源的训练方法。
相同问题我看您的截图也会提供10中野营旅行清单，但未给出每样物品的作用，ChatGLM-6B给出了每样物品的作用。我只是尝试了这一个例子，可能你的模型在其它方面表现更好。

tensor不匹配问题

RuntimeError: The size of tensor a (36) must match the size of tensor b (26) at non-singleton dimension 0

不管是用我自己制作的数据集还是项目中已经给的数据集都会出现这个问题，有人遇到过同样的错误吗或者有什么解决方法吗

请问下为什么在训练数据时要限制320最大长度？

能否使用GLM-130B训练成ChatGLM-130B呢？

如果资源是8A100（80G8=640G）的资源，能否在此基础上训练基于GLM-130B的ChatGLM-130B呢？

奇怪的回复

运行infer.py 再输入生成一个人野营旅行可能需要的十件物品的清单

输入：你好会得到一大串奇奇怪怪的信息，如下图：

RuntimeError: Boolean value of Tensor with more than one value is ambiguous

│

│ 273 │ │ │ │ tokens.append(self.sp_tokenizer.decode(single_token_ids)) │
│ 274 │ │ │ return (tokens) │
│ 275 │ │ else: │
│ ❱ 276 │ │ │ if self.pad_token_id in token_ids: # remove pad │
│ 277 │ │ │ │ token_ids = list(filter((self.pad_token_id).ne, token_ids)) │
│ 278 │ │ │ return self.sp_tokenizer.decode(token_ids) │
│ 279 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Boolean value of Tensor with more than one value is ambiguous

bug

llama 在生成时from dataprocess import format_example写错了吧

ValueError: 150004 is not in list 貌似是 input_ids的问题，麻烦帮忙看下哈

finetuning

我自己构建了一个简单的finetuning的数据集，就十句话都差不多是：”你是谁？“，答案都是：”我是ChatGPT“，然后我就进行finetuning，但无论我的迭代次数设定为多少次，训练损失函数都是0：

/root/miniconda3/envs/bab/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
 88%|██████████████████████████████████████████████████████████▋        | 35/40 [01:05<00:05,  1.03s/it]/root/miniconda3/envs/bab/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
{'train_runtime': 83.999, 'train_samples_per_second': 0.476, 'train_steps_per_second': 0.476, 'train_loss': 0.0, 'epoch': 10.0}
100%|███████████████████████████████████████████████████████████████████| 40/40 [01:23<00:00,  2.10s/it]

然后预测的时候都得到同样的回答：

问：你是谁
答：I'm sorry, I'm not sure what you're asking. Could you please provide more context or clarify your question?

但是当我问了多次”你是谁“的时候，回答会变，但是结果还是错误：

问：你是谁你是谁你是谁你是谁你是谁你是谁你是谁你是谁你是谁? 
答：你是谁你是谁你是谁你是谁你是谁你是谁你是谁你是谁你是谁? 我是一个人工智能助手,无法感知现实世界,也无法像人类一样思考和感受。我只能通过文字和语言来回答问题和提供帮助。如果你有任何问题需要帮助解答,欢迎随时向我提出。我会尽力回答你的问题。 非常感谢提问,我会尽力回答。如果有任何需要帮助的问题,请随时告诉我。

指定一下device吧

          指定一下device吧

Originally posted by @27182812 in #19 (comment)
指定到一个gpu卡上是可以的，但是怎么多卡训练呢？

生成的回复有额外的“问题”是啥原因

每个生成的回复都在后面带有另一个“问题”和“回复”

what is the difference between this repo and ChatGLM-Tuning

https://github.com/mymusise/ChatGLM-Tuning

ChatGLMForConditionalGeneration' object has no attribute 'enable_input_require_grads

您好：
run bash finetune.sh 报错，请帮忙看看

ChatGLM-chinese-insturct/finetune.py:133 in main │
│ │
│ 130 │ │ "THUDM/chatglm-6b", load_in_8bit=True, trust_remote_code=True, device_map=device │
│ 131 │ ) │
│ 132 │ model.gradient_checkpointing_enable() │
│ ❱ 133 │ model.enable_input_require_grads() │
│ 134 │ model.is_parallelizable = True │
│ 135 │ model.model_parallel = True │
│ 136 │ model.lm_head = CastOutputToFloat(model.lm_head)

AttributeError: 'ChatGLMForConditionalGeneration' object has no attribute 'enable_input_require_grads'

怎么在input中把history加进去？

llama-7b 自己的数据微调完answer一直在重复同一句话，而且也不是正确答案

有效果对比吗？

感谢！有对比效果吗？

你微调的哪个方向？

是在谷歌笔记本上微调的？本地需要什么配置？

请问作者是在A100训练的吗，我在A100没问题，在V100会报错，调整之后loss为0

我在A100训练可以正常运行，loss不为0，但是loss一直是2.几，没有降到1以下；
如果是V100，会报错： expected scalar type Half but found Float ；将fp16这个参数删掉后，可以正常运行，但是loss=0. 学习率调整到e-4,e-6,e-7都没用，loss始终为0
不知道作者在A100训练时，大概用了什么参数，什么数据，loss变化趋势是什么样的,大概多久loss降到1以下。
V100又是怎么训练的呢
谢谢作者大大

ValueError: Can't find config.json at './best_ckpt'

明明有这个文件，就是找不到是怎么回事呢，写了绝对路径也不行

报错：RuntimeError: expected scalar type Half but found Float

由finetune.sh启动，ddp设置的false，环境用的env.yml产出的conda
模型load成功，不过看起来像是在反向算梯度的时候报了关于精度的报错，请问这种问题可能是因为什么导致的呢？
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 8/8 [00:23<00:00, 2.94s/it]
/opt/conda/envs/gpt0/lib/python3.10/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
warnings.warn(
0%| | 0/30699 [00:00<?, ?it/s]/opt/conda/envs/gpt0/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Traceback (most recent call last):
File "/mnt/workspace/ChatGLM-chinese-insturct/finetune.py", line 172, in
main()
File "/mnt/workspace/ChatGLM-chinese-insturct/finetune.py", line 163, in main
trainer.train()
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/transformers/trainer.py", line 1633, in train
return inner_training_loop(
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/transformers/trainer.py", line 1902, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/transformers/trainer.py", line 2655, in training_step
self.scaler.scale(loss).backward()
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 456, in backward
grad_A = torch.matmul(grad_output, CB).view(ctx.grad_shape).to(ctx.dtype_A)
RuntimeError: expected scalar type Half but found Float

报错：error: the following arguments are required: --output_dir，请问应该怎样解决？

完整报错：
usage: finetune.py [-h] [--dataset_path DATASET_PATH]
[--model_path MODEL_PATH] [--lora_rank LORA_RANK]
--output_dir OUTPUT_DIR
[--overwrite_output_dir [OVERWRITE_OUTPUT_DIR]]
[--do_train [DO_TRAIN]] [--do_eval [DO_EVAL]]
[--do_predict [DO_PREDICT]]
[--evaluation_strategy {no,steps,epoch}]
[--prediction_loss_only [PREDICTION_LOSS_ONLY]]
[--per_device_train_batch_size PER_DEVICE_TRAIN_BATCH_SIZE]
[--per_device_eval_batch_size PER_DEVICE_EVAL_BATCH_SIZE]
[--per_gpu_train_batch_size PER_GPU_TRAIN_BATCH_SIZE]
[--per_gpu_eval_batch_size PER_GPU_EVAL_BATCH_SIZE]
[--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
[--eval_accumulation_steps EVAL_ACCUMULATION_STEPS]
[--eval_delay EVAL_DELAY] [--learning_rate LEARNING_RATE]
[--weight_decay WEIGHT_DECAY] [--adam_beta1 ADAM_BETA1]
[--adam_beta2 ADAM_BETA2] [--adam_epsilon ADAM_EPSILON]
[--max_grad_norm MAX_GRAD_NORM]
[--num_train_epochs NUM_TRAIN_EPOCHS]
[--max_steps MAX_STEPS]
[--lr_scheduler_type {linear,cosine,cosine_with_restarts,polynomial,constant,constant_with_warmup}]
[--warmup_ratio WARMUP_RATIO] [--warmup_steps WARMUP_STEPS]
[--log_level {debug,info,warning,error,critical,passive}]
[--log_level_replica {debug,info,warning,error,critical,passive}]
[--log_on_each_node [LOG_ON_EACH_NODE]]
[--no_log_on_each_node] [--logging_dir LOGGING_DIR]
[--logging_strategy {no,steps,epoch}]
[--logging_first_step [LOGGING_FIRST_STEP]]
[--logging_steps LOGGING_STEPS]
[--logging_nan_inf_filter [LOGGING_NAN_INF_FILTER]]
[--no_logging_nan_inf_filter]
[--save_strategy {no,steps,epoch}]
[--save_steps SAVE_STEPS]
[--save_total_limit SAVE_TOTAL_LIMIT]
[--save_on_each_node [SAVE_ON_EACH_NODE]]
[--no_cuda [NO_CUDA]] [--use_mps_device [USE_MPS_DEVICE]]
[--seed SEED] [--data_seed DATA_SEED]
[--jit_mode_eval [JIT_MODE_EVAL]] [--use_ipex [USE_IPEX]]
[--bf16 [BF16]] [--fp16 [FP16]]
[--fp16_opt_level FP16_OPT_LEVEL]
[--half_precision_backend {auto,cuda_amp,apex,cpu_amp}]
[--bf16_full_eval [BF16_FULL_EVAL]]
[--fp16_full_eval [FP16_FULL_EVAL]] [--tf32 TF32]
[--local_rank LOCAL_RANK] [--xpu_backend {mpi,ccl,gloo}]
[--tpu_num_cores TPU_NUM_CORES]
[--tpu_metrics_debug [TPU_METRICS_DEBUG]] [--debug DEBUG]
[--dataloader_drop_last [DATALOADER_DROP_LAST]]
[--eval_steps EVAL_STEPS]
[--dataloader_num_workers DATALOADER_NUM_WORKERS]
[--past_index PAST_INDEX] [--run_name RUN_NAME]
[--disable_tqdm DISABLE_TQDM]
[--remove_unused_columns [REMOVE_UNUSED_COLUMNS]]
[--no_remove_unused_columns]
[--label_names LABEL_NAMES [LABEL_NAMES ...]]
[--load_best_model_at_end [LOAD_BEST_MODEL_AT_END]]
[--metric_for_best_model METRIC_FOR_BEST_MODEL]
[--greater_is_better GREATER_IS_BETTER]
[--ignore_data_skip [IGNORE_DATA_SKIP]]
[--sharded_ddp SHARDED_DDP] [--fsdp FSDP]
[--fsdp_min_num_params FSDP_MIN_NUM_PARAMS]
[--fsdp_transformer_layer_cls_to_wrap FSDP_TRANSFORMER_LAYER_CLS_TO_WRAP]
[--deepspeed DEEPSPEED]
[--label_smoothing_factor LABEL_SMOOTHING_FACTOR]
[--optim {adamw_hf,adamw_torch,adamw_torch_xla,adamw_apex_fused,adafactor,adamw_bnb_8bit,adamw_anyprecision,sgd,adagrad}]
[--optim_args OPTIM_ARGS] [--adafactor [ADAFACTOR]]
[--group_by_length [GROUP_BY_LENGTH]]
[--length_column_name LENGTH_COLUMN_NAME]
[--report_to REPORT_TO [REPORT_TO ...]]
[--ddp_find_unused_parameters DDP_FIND_UNUSED_PARAMETERS]
[--ddp_bucket_cap_mb DDP_BUCKET_CAP_MB]
[--dataloader_pin_memory [DATALOADER_PIN_MEMORY]]
[--no_dataloader_pin_memory]
[--skip_memory_metrics [SKIP_MEMORY_METRICS]]
[--no_skip_memory_metrics]
[--use_legacy_prediction_loop [USE_LEGACY_PREDICTION_LOOP]]
[--push_to_hub [PUSH_TO_HUB]]
[--resume_from_checkpoint RESUME_FROM_CHECKPOINT]
[--hub_model_id HUB_MODEL_ID]
[--hub_strategy {end,every_save,checkpoint,all_checkpoints}]
[--hub_token HUB_TOKEN]
[--hub_private_repo [HUB_PRIVATE_REPO]]
[--gradient_checkpointing [GRADIENT_CHECKPOINTING]]
[--include_inputs_for_metrics [INCLUDE_INPUTS_FOR_METRICS]]
[--fp16_backend {auto,cuda_amp,apex,cpu_amp}]
[--push_to_hub_model_id PUSH_TO_HUB_MODEL_ID]
[--push_to_hub_organization PUSH_TO_HUB_ORGANIZATION]
[--push_to_hub_token PUSH_TO_HUB_TOKEN]
[--mp_parameters MP_PARAMETERS]
[--auto_find_batch_size [AUTO_FIND_BATCH_SIZE]]
[--full_determinism [FULL_DETERMINISM]]
[--torchdynamo {eager,aot_eager,inductor,nvfuser,aot_nvfuser,aot_cudagraphs,ofi,fx2trt,onnxrt,ipex}]
[--ray_scope RAY_SCOPE] [--ddp_timeout DDP_TIMEOUT]
[--torch_compile [TORCH_COMPILE]]
[--torch_compile_backend {eager,aot_eager,inductor,nvfuser,aot_nvfuser,aot_cudagraphs,ofi,fx2trt,onnxrt,ipex}]
[--torch_compile_mode {default,reduce-overhead,max-autotune}]
finetune.py: error: the following arguments are required: --output_dir

进程已结束,退出代码2

conda环境报错

ERROR: Could not find a version that satisfies the requirement bitsandbytes==0.37.2 (from versions: 0.31.8, 0.32.0, 0.32.1, 0.32.2, 0.32.3, 0.33.0, 0.33.1, 0.34.0, 0.35.0, 0.35.1, 0.35.2, 0.35.3, 0.35.4, 0.36.0, 0.36.0.post1, 0.36.0.post2, 0.37.0, 0.37.1)
ERROR: No matching distribution found for bitsandbytes==0.37.2

创建conda虚拟环境时，装这个包时报错了，请问是pip版本的问题吗？

27182812 / chatglm-llama-chinese-insturct Goto Github PK

chatglm-llama-chinese-insturct's People

Contributors

Stargazers

Watchers

Forkers

chatglm-llama-chinese-insturct's Issues

Recommend Projects

Recommend Topics

Recommend Org