Coder Social home page Coder Social logo

chatglm-llama-chinese-insturct's People


27182812 avatar vxfla avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar  avatar  avatar

chatglm-llama-chinese-insturct's Issues


大佬 GLM有没有在公开数据集上和其他LLM对比过?或者说是,有没有一个评价标注,怎么说他好他坏


--max_source_length 64
--max_target_length 64
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 16


===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to:
CUDA SETUP: CUDA runtime path found: /root/miniconda3/envs/bab/lib/
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /root/miniconda3/envs/bab/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|██████████████████████████████████████████| 8/8 [00:09<00:00,  1.23s/it]
Traceback (most recent call last):
  File "/root/autodl-tmp/ChatGLM-chinese-insturct/", line 168, in <module>
  File "/root/autodl-tmp/ChatGLM-chinese-insturct/", line 147, in main
    model = get_peft_model(model, peft_config)
  File "/root/miniconda3/envs/bab/lib/python3.10/site-packages/peft/", line 142, in get_peft_model
    peft_config = _prepare_lora_config(peft_config, model_config)
  File "/root/miniconda3/envs/bab/lib/python3.10/site-packages/peft/", line 117, in _prepare_lora_config
    raise ValueError("Please specify `target_modules` in `peft_config`")
ValueError: Please specify `target_modules` in `peft_config`

微调报错:RuntimeError: CUDA error: device-side assert triggered


RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.



hello,很高兴你这么快又更新了基于扩充词表llama 的微调结果,我有个疑问:扩充词表的llama 我看他的训练细节也是基于lora进行的预训练,为什么我再合并了模型之后在模型的结构里没看到lora的参数层?你在这个基础上进行lora的微调是直接吧他的llama模型当成基础模型然后再初始化了一个lora结构进行微调的吗。看到的话能帮忙解答吗,感谢~




RuntimeError: The size of tensor a (36) must match the size of tensor b (26) at non-singleton dimension 0



运行 再输入生成一个人野营旅行可能需要的十件物品的清单
输入:你好 会得到一大串奇奇怪怪的信息,如下图:

RuntimeError: Boolean value of Tensor with more than one value is ambiguous

│ 273 │ │ │ │ tokens.append(self.sp_tokenizer.decode(single_token_ids)) │
│ 274 │ │ │ return (tokens) │
│ 275 │ │ else: │
│ ❱ 276 │ │ │ if self.pad_token_id in token_ids: # remove pad │
│ 277 │ │ │ │ token_ids = list(filter((self.pad_token_id).ne, token_ids)) │
│ 278 │ │ │ return self.sp_tokenizer.decode(token_ids) │
│ 279 │
RuntimeError: Boolean value of Tensor with more than one value is ambiguous


llama 在生成时from dataprocess import format_example写错了吧



/root/miniconda3/envs/bab/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/autograd/ UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
 88%|██████████████████████████████████████████████████████████▋        | 35/40 [01:05<00:05,  1.03s/it]/root/miniconda3/envs/bab/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/autograd/ UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
{'train_runtime': 83.999, 'train_samples_per_second': 0.476, 'train_steps_per_second': 0.476, 'train_loss': 0.0, 'epoch': 10.0}
100%|███████████████████████████████████████████████████████████████████| 40/40 [01:23<00:00,  2.10s/it]


答:I'm sorry, I'm not sure what you're asking. Could you please provide more context or clarify your question?


答:你是谁你是谁你是谁你是谁你是谁你是谁你是谁你是谁你是谁? 我是一个人工智能助手,无法感知现实世界,也无法像人类一样思考和感受。我只能通过文字和语言来回答问题和提供帮助。如果你有任何问题需要帮助解答,欢迎随时向我提出。我会尽力回答你的问题。 非常感谢提问,我会尽力回答。如果有任何需要帮助的问题,请随时告诉我。

ChatGLMForConditionalGeneration' object has no attribute 'enable_input_require_grads

run bash 报错,请帮忙看看

ChatGLM-chinese-insturct/ in main │
│ │
│ 130 │ │ "THUDM/chatglm-6b", load_in_8bit=True, trust_remote_code=True, device_map=device │
│ 131 │ ) │
│ 132 │ model.gradient_checkpointing_enable() │
│ ❱ 133 │ model.enable_input_require_grads() │
│ 134 │ model.is_parallelizable = True │
│ 135 │ model.model_parallel = True │
│ 136 │ model.lm_head = CastOutputToFloat(model.lm_head)

AttributeError: 'ChatGLMForConditionalGeneration' object has no attribute 'enable_input_require_grads'


如果是V100,会报错: expected scalar type Half but found Float ;将fp16这个参数删掉后,可以正常运行,但是loss=0. 学习率调整到e-4,e-6,e-7都没用,loss始终为0

报错:RuntimeError: expected scalar type Half but found Float

The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 8/8 [00:23<00:00, 2.94s/it]
/opt/conda/envs/gpt0/lib/python3.10/site-packages/transformers/ FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
0%| | 0/30699 [00:00<?, ?it/s]/opt/conda/envs/gpt0/lib/python3.10/site-packages/bitsandbytes/autograd/ UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Traceback (most recent call last):
File "/mnt/workspace/ChatGLM-chinese-insturct/", line 172, in
File "/mnt/workspace/ChatGLM-chinese-insturct/", line 163, in main
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/transformers/", line 1633, in train
return inner_training_loop(
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/transformers/", line 1902, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/transformers/", line 2655, in training_step
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/torch/", line 487, in backward
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/torch/autograd/", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/torch/autograd/", line 274, in apply
return user_fn(self, *args)
File "/opt/conda/envs/gpt0/lib/python3.10/site-packages/bitsandbytes/autograd/", line 456, in backward
grad_A = torch.matmul(grad_output, CB).view(ctx.grad_shape).to(ctx.dtype_A)
RuntimeError: expected scalar type Half but found Float

报错:error: the following arguments are required: --output_dir,请问应该怎样解决?

usage: [-h] [--dataset_path DATASET_PATH]
[--model_path MODEL_PATH] [--lora_rank LORA_RANK]
--output_dir OUTPUT_DIR
[--overwrite_output_dir [OVERWRITE_OUTPUT_DIR]]
[--do_train [DO_TRAIN]] [--do_eval [DO_EVAL]]
[--do_predict [DO_PREDICT]]
[--evaluation_strategy {no,steps,epoch}]
[--prediction_loss_only [PREDICTION_LOSS_ONLY]]
[--per_device_train_batch_size PER_DEVICE_TRAIN_BATCH_SIZE]
[--per_device_eval_batch_size PER_DEVICE_EVAL_BATCH_SIZE]
[--per_gpu_train_batch_size PER_GPU_TRAIN_BATCH_SIZE]
[--per_gpu_eval_batch_size PER_GPU_EVAL_BATCH_SIZE]
[--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
[--eval_accumulation_steps EVAL_ACCUMULATION_STEPS]
[--eval_delay EVAL_DELAY] [--learning_rate LEARNING_RATE]
[--weight_decay WEIGHT_DECAY] [--adam_beta1 ADAM_BETA1]
[--adam_beta2 ADAM_BETA2] [--adam_epsilon ADAM_EPSILON]
[--max_grad_norm MAX_GRAD_NORM]
[--num_train_epochs NUM_TRAIN_EPOCHS]
[--max_steps MAX_STEPS]
[--lr_scheduler_type {linear,cosine,cosine_with_restarts,polynomial,constant,constant_with_warmup}]
[--warmup_ratio WARMUP_RATIO] [--warmup_steps WARMUP_STEPS]
[--log_level {debug,info,warning,error,critical,passive}]
[--log_level_replica {debug,info,warning,error,critical,passive}]
[--log_on_each_node [LOG_ON_EACH_NODE]]
[--no_log_on_each_node] [--logging_dir LOGGING_DIR]
[--logging_strategy {no,steps,epoch}]
[--logging_first_step [LOGGING_FIRST_STEP]]
[--logging_steps LOGGING_STEPS]
[--logging_nan_inf_filter [LOGGING_NAN_INF_FILTER]]
[--save_strategy {no,steps,epoch}]
[--save_steps SAVE_STEPS]
[--save_total_limit SAVE_TOTAL_LIMIT]
[--save_on_each_node [SAVE_ON_EACH_NODE]]
[--no_cuda [NO_CUDA]] [--use_mps_device [USE_MPS_DEVICE]]
[--seed SEED] [--data_seed DATA_SEED]
[--jit_mode_eval [JIT_MODE_EVAL]] [--use_ipex [USE_IPEX]]
[--bf16 [BF16]] [--fp16 [FP16]]
[--fp16_opt_level FP16_OPT_LEVEL]
[--half_precision_backend {auto,cuda_amp,apex,cpu_amp}]
[--bf16_full_eval [BF16_FULL_EVAL]]
[--fp16_full_eval [FP16_FULL_EVAL]] [--tf32 TF32]
[--local_rank LOCAL_RANK] [--xpu_backend {mpi,ccl,gloo}]
[--tpu_num_cores TPU_NUM_CORES]
[--tpu_metrics_debug [TPU_METRICS_DEBUG]] [--debug DEBUG]
[--dataloader_drop_last [DATALOADER_DROP_LAST]]
[--eval_steps EVAL_STEPS]
[--dataloader_num_workers DATALOADER_NUM_WORKERS]
[--past_index PAST_INDEX] [--run_name RUN_NAME]
[--disable_tqdm DISABLE_TQDM]
[--remove_unused_columns [REMOVE_UNUSED_COLUMNS]]
[--label_names LABEL_NAMES [LABEL_NAMES ...]]
[--load_best_model_at_end [LOAD_BEST_MODEL_AT_END]]
[--metric_for_best_model METRIC_FOR_BEST_MODEL]
[--greater_is_better GREATER_IS_BETTER]
[--ignore_data_skip [IGNORE_DATA_SKIP]]
[--sharded_ddp SHARDED_DDP] [--fsdp FSDP]
[--fsdp_min_num_params FSDP_MIN_NUM_PARAMS]
[--fsdp_transformer_layer_cls_to_wrap FSDP_TRANSFORMER_LAYER_CLS_TO_WRAP]
[--deepspeed DEEPSPEED]
[--label_smoothing_factor LABEL_SMOOTHING_FACTOR]
[--optim {adamw_hf,adamw_torch,adamw_torch_xla,adamw_apex_fused,adafactor,adamw_bnb_8bit,adamw_anyprecision,sgd,adagrad}]
[--optim_args OPTIM_ARGS] [--adafactor [ADAFACTOR]]
[--group_by_length [GROUP_BY_LENGTH]]
[--length_column_name LENGTH_COLUMN_NAME]
[--report_to REPORT_TO [REPORT_TO ...]]
[--ddp_find_unused_parameters DDP_FIND_UNUSED_PARAMETERS]
[--ddp_bucket_cap_mb DDP_BUCKET_CAP_MB]
[--dataloader_pin_memory [DATALOADER_PIN_MEMORY]]
[--skip_memory_metrics [SKIP_MEMORY_METRICS]]
[--use_legacy_prediction_loop [USE_LEGACY_PREDICTION_LOOP]]
[--push_to_hub [PUSH_TO_HUB]]
[--resume_from_checkpoint RESUME_FROM_CHECKPOINT]
[--hub_model_id HUB_MODEL_ID]
[--hub_strategy {end,every_save,checkpoint,all_checkpoints}]
[--hub_token HUB_TOKEN]
[--hub_private_repo [HUB_PRIVATE_REPO]]
[--gradient_checkpointing [GRADIENT_CHECKPOINTING]]
[--include_inputs_for_metrics [INCLUDE_INPUTS_FOR_METRICS]]
[--fp16_backend {auto,cuda_amp,apex,cpu_amp}]
[--push_to_hub_model_id PUSH_TO_HUB_MODEL_ID]
[--push_to_hub_organization PUSH_TO_HUB_ORGANIZATION]
[--push_to_hub_token PUSH_TO_HUB_TOKEN]
[--mp_parameters MP_PARAMETERS]
[--auto_find_batch_size [AUTO_FIND_BATCH_SIZE]]
[--full_determinism [FULL_DETERMINISM]]
[--torchdynamo {eager,aot_eager,inductor,nvfuser,aot_nvfuser,aot_cudagraphs,ofi,fx2trt,onnxrt,ipex}]
[--ray_scope RAY_SCOPE] [--ddp_timeout DDP_TIMEOUT]
[--torch_compile [TORCH_COMPILE]]
[--torch_compile_backend {eager,aot_eager,inductor,nvfuser,aot_nvfuser,aot_cudagraphs,ofi,fx2trt,onnxrt,ipex}]
[--torch_compile_mode {default,reduce-overhead,max-autotune}] error: the following arguments are required: --output_dir



ERROR: Could not find a version that satisfies the requirement bitsandbytes==0.37.2 (from versions: 0.31.8, 0.32.0, 0.32.1, 0.32.2, 0.32.3, 0.33.0, 0.33.1, 0.34.0, 0.35.0, 0.35.1, 0.35.2, 0.35.3, 0.35.4, 0.36.0, 0.36.0.post1, 0.36.0.post2, 0.37.0, 0.37.1)
ERROR: No matching distribution found for bitsandbytes==0.37.2


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.