27182812 / chatglm-llama-chinese-insturct Goto Github PK
View Code? Open in Web Editor NEW探索中文instruct数据在ChatGLM, LLaMA上的微调表现
探索中文instruct数据在ChatGLM, LLaMA上的微调表现
大佬 GLM有没有在公开数据集上和其他LLM对比过?或者说是,有没有一个评价标注,怎么说他好他坏
AttributeError: /root/anaconda3/envs/bab/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats
啥问题。
A6000 显卡
我用T4,16gb显卡,训练3000step,部分参数如下
--max_source_length 64
--max_target_length 64
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 16
54mb大小的数据集,我发现居然要7个小时,请问3090或者v100会更快吗?应该多少的显卡可以保证训练速度可以跟上呢
单卡3090 可以不可以训练,怎么支持多卡训练?
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /root/miniconda3/envs/bab/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /root/miniconda3/envs/bab/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/libbitsandbytes_cuda118.so...
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|██████████████████████████████████████████| 8/8 [00:09<00:00, 1.23s/it]
Traceback (most recent call last):
File "/root/autodl-tmp/ChatGLM-chinese-insturct/finetune.py", line 168, in <module>
main()
File "/root/autodl-tmp/ChatGLM-chinese-insturct/finetune.py", line 147, in main
model = get_peft_model(model, peft_config)
File "/root/miniconda3/envs/bab/lib/python3.10/site-packages/peft/mapping.py", line 142, in get_peft_model
peft_config = _prepare_lora_config(peft_config, model_config)
File "/root/miniconda3/envs/bab/lib/python3.10/site-packages/peft/mapping.py", line 117, in _prepare_lora_config
raise ValueError("Please specify `target_modules` in `peft_config`")
ValueError: Please specify `target_modules` in `peft_config`
问下大佬,能给个数据连接吗
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
用项目自带的数据微调,遇到这个报错,请教一下如何解决
hello,很高兴你这么快又更新了基于扩充词表llama 的微调结果,我有个疑问:扩充词表的llama 我看他的训练细节也是基于lora进行的预训练,为什么我再合并了模型之后在模型的结构里没看到lora的参数层?你在这个基础上进行lora的微调是直接吧他的llama模型当成基础模型然后再初始化了一个lora结构进行微调的吗。看到的话能帮忙解答吗,感谢~
RuntimeError: The size of tensor a (36) must match the size of tensor b (26) at non-singleton dimension 0
不管是用我自己制作的数据集还是项目中已经给的数据集都会出现这个问题,有人遇到过同样的错误吗或者有什么解决方法吗
如果资源是8A100(80G8=640G)的资源,能否在此基础上训练基于GLM-130B的ChatGLM-130B呢?
│
│ 273 │ │ │ │ tokens.append(self.sp_tokenizer.decode(single_token_ids)) │
│ 274 │ │ │ return (tokens) │
│ 275 │ │ else: │
│ ❱ 276 │ │ │ if self.pad_token_id in token_ids: # remove pad │
│ 277 │ │ │ │ token_ids = list(filter((self.pad_token_id).ne, token_ids)) │
│ 278 │ │ │ return self.sp_tokenizer.decode(token_ids) │
│ 279 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Boolean value of Tensor with more than one value is ambiguous
llama 在生成时from dataprocess import format_example写错了吧
我自己构建了一个简单的finetuning的数据集,就十句话都差不多是:”你是谁?“,答案都是:”我是ChatGPT“,然后我就进行finetuning,但无论我的迭代次数设定为多少次,训练损失函数都是0:
/root/miniconda3/envs/bab/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
88%|██████████████████████████████████████████████████████████▋ | 35/40 [01:05<00:05, 1.03s/it]/root/miniconda3/envs/bab/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
{'train_runtime': 83.999, 'train_samples_per_second': 0.476, 'train_steps_per_second': 0.476, 'train_loss': 0.0, 'epoch': 10.0}
100%|███████████████████████████████████████████████████████████████████| 40/40 [01:23<00:00, 2.10s/it]
然后预测的时候都得到同样的回答:
问:你是谁
答:I'm sorry, I'm not sure what you're asking. Could you please provide more context or clarify your question?
但是当我问了多次”你是谁“的时候,回答会变,但是结果还是错误:
问:你是谁你是谁你是谁你是谁你是谁你是谁你是谁你是谁你是谁?
答:你是谁你是谁你是谁你是谁你是谁你是谁你是谁你是谁你是谁? 我是一个人工智能助手,无法感知现实世界,也无法像人类一样思考和感受。我只能通过文字和语言来回答问题和提供帮助。如果你有任何问题需要帮助解答,欢迎随时向我提出。我会尽力回答你的问题。 非常感谢提问,我会尽力回答。如果有任何需要帮助的问题,请随时告诉我。
指定一下device吧
Originally posted by @27182812 in #19 (comment)
指定到一个gpu卡上是可以的,但是怎么多卡训练呢?
您好:
run bash finetune.sh 报错,请帮忙看看
ChatGLM-chinese-insturct/finetune.py:133 in main │
│ │
│ 130 │ │ "THUDM/chatglm-6b", load_in_8bit=True, trust_remote_code=True, device_map=device │
│ 131 │ ) │
│ 132 │ model.gradient_checkpointing_enable() │
│ ❱ 133 │ model.enable_input_require_grads() │
│ 134 │ model.is_parallelizable = True │
│ 135 │ model.model_parallel = True │
│ 136 │ model.lm_head = CastOutputToFloat(model.lm_head)
AttributeError: 'ChatGLMForConditionalGeneration' object has no attribute 'enable_input_require_grads'
感谢!有对比效果吗?
是在谷歌笔记本上微调的?本地需要什么配置?
我在A100训练可以正常运行,loss不为0,但是loss一直是2.几,没有降到1以下;
如果是V100,会报错: expected scalar type Half but found Float ;将fp16这个参数删掉后,可以正常运行,但是loss=0. 学习率调整到e-4,e-6,e-7都没用,loss始终为0
不知道作者在A100训练时,大概用了什么参数,什么数据,loss变化趋势是什么样的,大概多久loss降到1以下。
V100又是怎么训练的呢
谢谢作者大大
由finetune.sh启动,ddp设置的false,环境用的env.yml产出的conda
模型load成功,不过看起来像是在反向算梯度的时候报了关于精度的报错,请问这种问题可能是因为什么导致的呢?
The argument trust_remote_code
is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 8/8 [00:23<00:00, 2.94s/it]
/opt/conda/envs/gpt0/lib/python3.10/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True
to disable this warning
warnings.warn(
0%| | 0/30699 [00:00<?, ?it/s]/opt/conda/envs/gpt0/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Traceback (most recent call last):
File "/mnt/workspace/ChatGLM-chinese-insturct/finetune.py", line 172, in
main()
File "/mnt/workspace/ChatGLM-chinese-insturct/finetune.py", line 163, in main
trainer.train()
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/transformers/trainer.py", line 1633, in train
return inner_training_loop(
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/transformers/trainer.py", line 1902, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/transformers/trainer.py", line 2655, in training_step
self.scaler.scale(loss).backward()
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 456, in backward
grad_A = torch.matmul(grad_output, CB).view(ctx.grad_shape).to(ctx.dtype_A)
RuntimeError: expected scalar type Half but found Float
完整报错:
usage: finetune.py [-h] [--dataset_path DATASET_PATH]
[--model_path MODEL_PATH] [--lora_rank LORA_RANK]
--output_dir OUTPUT_DIR
[--overwrite_output_dir [OVERWRITE_OUTPUT_DIR]]
[--do_train [DO_TRAIN]] [--do_eval [DO_EVAL]]
[--do_predict [DO_PREDICT]]
[--evaluation_strategy {no,steps,epoch}]
[--prediction_loss_only [PREDICTION_LOSS_ONLY]]
[--per_device_train_batch_size PER_DEVICE_TRAIN_BATCH_SIZE]
[--per_device_eval_batch_size PER_DEVICE_EVAL_BATCH_SIZE]
[--per_gpu_train_batch_size PER_GPU_TRAIN_BATCH_SIZE]
[--per_gpu_eval_batch_size PER_GPU_EVAL_BATCH_SIZE]
[--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
[--eval_accumulation_steps EVAL_ACCUMULATION_STEPS]
[--eval_delay EVAL_DELAY] [--learning_rate LEARNING_RATE]
[--weight_decay WEIGHT_DECAY] [--adam_beta1 ADAM_BETA1]
[--adam_beta2 ADAM_BETA2] [--adam_epsilon ADAM_EPSILON]
[--max_grad_norm MAX_GRAD_NORM]
[--num_train_epochs NUM_TRAIN_EPOCHS]
[--max_steps MAX_STEPS]
[--lr_scheduler_type {linear,cosine,cosine_with_restarts,polynomial,constant,constant_with_warmup}]
[--warmup_ratio WARMUP_RATIO] [--warmup_steps WARMUP_STEPS]
[--log_level {debug,info,warning,error,critical,passive}]
[--log_level_replica {debug,info,warning,error,critical,passive}]
[--log_on_each_node [LOG_ON_EACH_NODE]]
[--no_log_on_each_node] [--logging_dir LOGGING_DIR]
[--logging_strategy {no,steps,epoch}]
[--logging_first_step [LOGGING_FIRST_STEP]]
[--logging_steps LOGGING_STEPS]
[--logging_nan_inf_filter [LOGGING_NAN_INF_FILTER]]
[--no_logging_nan_inf_filter]
[--save_strategy {no,steps,epoch}]
[--save_steps SAVE_STEPS]
[--save_total_limit SAVE_TOTAL_LIMIT]
[--save_on_each_node [SAVE_ON_EACH_NODE]]
[--no_cuda [NO_CUDA]] [--use_mps_device [USE_MPS_DEVICE]]
[--seed SEED] [--data_seed DATA_SEED]
[--jit_mode_eval [JIT_MODE_EVAL]] [--use_ipex [USE_IPEX]]
[--bf16 [BF16]] [--fp16 [FP16]]
[--fp16_opt_level FP16_OPT_LEVEL]
[--half_precision_backend {auto,cuda_amp,apex,cpu_amp}]
[--bf16_full_eval [BF16_FULL_EVAL]]
[--fp16_full_eval [FP16_FULL_EVAL]] [--tf32 TF32]
[--local_rank LOCAL_RANK] [--xpu_backend {mpi,ccl,gloo}]
[--tpu_num_cores TPU_NUM_CORES]
[--tpu_metrics_debug [TPU_METRICS_DEBUG]] [--debug DEBUG]
[--dataloader_drop_last [DATALOADER_DROP_LAST]]
[--eval_steps EVAL_STEPS]
[--dataloader_num_workers DATALOADER_NUM_WORKERS]
[--past_index PAST_INDEX] [--run_name RUN_NAME]
[--disable_tqdm DISABLE_TQDM]
[--remove_unused_columns [REMOVE_UNUSED_COLUMNS]]
[--no_remove_unused_columns]
[--label_names LABEL_NAMES [LABEL_NAMES ...]]
[--load_best_model_at_end [LOAD_BEST_MODEL_AT_END]]
[--metric_for_best_model METRIC_FOR_BEST_MODEL]
[--greater_is_better GREATER_IS_BETTER]
[--ignore_data_skip [IGNORE_DATA_SKIP]]
[--sharded_ddp SHARDED_DDP] [--fsdp FSDP]
[--fsdp_min_num_params FSDP_MIN_NUM_PARAMS]
[--fsdp_transformer_layer_cls_to_wrap FSDP_TRANSFORMER_LAYER_CLS_TO_WRAP]
[--deepspeed DEEPSPEED]
[--label_smoothing_factor LABEL_SMOOTHING_FACTOR]
[--optim {adamw_hf,adamw_torch,adamw_torch_xla,adamw_apex_fused,adafactor,adamw_bnb_8bit,adamw_anyprecision,sgd,adagrad}]
[--optim_args OPTIM_ARGS] [--adafactor [ADAFACTOR]]
[--group_by_length [GROUP_BY_LENGTH]]
[--length_column_name LENGTH_COLUMN_NAME]
[--report_to REPORT_TO [REPORT_TO ...]]
[--ddp_find_unused_parameters DDP_FIND_UNUSED_PARAMETERS]
[--ddp_bucket_cap_mb DDP_BUCKET_CAP_MB]
[--dataloader_pin_memory [DATALOADER_PIN_MEMORY]]
[--no_dataloader_pin_memory]
[--skip_memory_metrics [SKIP_MEMORY_METRICS]]
[--no_skip_memory_metrics]
[--use_legacy_prediction_loop [USE_LEGACY_PREDICTION_LOOP]]
[--push_to_hub [PUSH_TO_HUB]]
[--resume_from_checkpoint RESUME_FROM_CHECKPOINT]
[--hub_model_id HUB_MODEL_ID]
[--hub_strategy {end,every_save,checkpoint,all_checkpoints}]
[--hub_token HUB_TOKEN]
[--hub_private_repo [HUB_PRIVATE_REPO]]
[--gradient_checkpointing [GRADIENT_CHECKPOINTING]]
[--include_inputs_for_metrics [INCLUDE_INPUTS_FOR_METRICS]]
[--fp16_backend {auto,cuda_amp,apex,cpu_amp}]
[--push_to_hub_model_id PUSH_TO_HUB_MODEL_ID]
[--push_to_hub_organization PUSH_TO_HUB_ORGANIZATION]
[--push_to_hub_token PUSH_TO_HUB_TOKEN]
[--mp_parameters MP_PARAMETERS]
[--auto_find_batch_size [AUTO_FIND_BATCH_SIZE]]
[--full_determinism [FULL_DETERMINISM]]
[--torchdynamo {eager,aot_eager,inductor,nvfuser,aot_nvfuser,aot_cudagraphs,ofi,fx2trt,onnxrt,ipex}]
[--ray_scope RAY_SCOPE] [--ddp_timeout DDP_TIMEOUT]
[--torch_compile [TORCH_COMPILE]]
[--torch_compile_backend {eager,aot_eager,inductor,nvfuser,aot_nvfuser,aot_cudagraphs,ofi,fx2trt,onnxrt,ipex}]
[--torch_compile_mode {default,reduce-overhead,max-autotune}]
finetune.py: error: the following arguments are required: --output_dir
进程已结束,退出代码2
ERROR: Could not find a version that satisfies the requirement bitsandbytes==0.37.2 (from versions: 0.31.8, 0.32.0, 0.32.1, 0.32.2, 0.32.3, 0.33.0, 0.33.1, 0.34.0, 0.35.0, 0.35.1, 0.35.2, 0.35.3, 0.35.4, 0.36.0, 0.36.0.post1, 0.36.0.post2, 0.37.0, 0.37.1)
ERROR: No matching distribution found for bitsandbytes==0.37.2
创建conda虚拟环境时,装这个包时报错了,请问是pip版本的问题吗?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.