Comments (4)
您好,我这边又验证了一下,没有问题:
- 分析了原因,可能是GLM团队在hf上面又更新了模型以及内部model脚本,最新得模型不再与peft适配。这个适配问题需要GLM团队内部后续继续完善新版本的model。我看了一下hf上面官网也没有再保留老版本的GLM3模型,modelscope里面还有比较老版本的模型,可以先下载尝试一下下载链接。所以如果还想有需求我这边可以把稳定的model上传到autodl的镜像,供大家体验。
- btw,本教程的初心是大家掌握hf微调流程,主要是peft的使用方法,而不是model自己内部的方法。我们的教程类似于教大家使用torch,而不是具体某种特种卷积适配torch的算子怎么写。模型没有稳定也不是大家的责任~大家不要灰心,可以先行体验其他教程,如还想体验GLM效果,可私聊进答疑群,我后续会共享稳定版本的autodl镜像。
from self-llm.
@Hongru0306 @KMnO4-zx 您好,我还在学习大模型的微调和使用。这个问题困扰了我很久了,我发现 #37和 #47都是说的一件事情,而且我在里面没有找到可行的解答。#47中的问题是“chatGLM3微调过程中报错”,我认为其实是保存模型的时候失败了,可以看到它在训练到101的时候停了,这是因为设置的step是100,所以在训练到101的时候就报错了。下面是我的args
args = TrainingArguments( output_dir="/root/autodl-tmp/self-llm/ChatGLM/output", per_device_train_batch_size=1, gradient_accumulation_steps=8, logging_steps=20, num_train_epochs=1, gradient_checkpointing=True, save_steps=15, )
如果不使用
gradient_checkpointing=True
,最后使用trainer.save_model()
或者model.save_pretrained("my_finetuned_model")
保存也会出现同样的错误TypeError: Object of type set is not JSON serializable
,#37的建议是“transformers更新到4.37.2”,我的版本就是4.37.2,并且更新到4.38.2后也不起作用。 期间,args设置了gradient_checkpointing=True
,会出现RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
的错误,我通过在get_peft_model
调用model.enable_input_require_grads()
解决了这个问题,我不知道和这个有关系吗。这个错误我已经花了很多时间去解决,我也试过md文件和py文件中的代码,他们都差不多,希望维护者能够解答我的困惑。 下面是报错信息,应该和#47是一样的--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[22], line 1 ----> 1 trainer.train() File ~/miniconda3/envs/ma_code_interpreter/lib/python3.10/site-packages/transformers/trainer.py:1624, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs) 1622 hf_hub_utils.enable_progress_bars() 1623 else: -> 1624 return inner_training_loop( 1625 args=args, 1626 resume_from_checkpoint=resume_from_checkpoint, 1627 trial=trial, 1628 ignore_keys_for_eval=ignore_keys_for_eval, 1629 ) File ~/miniconda3/envs/ma_code_interpreter/lib/python3.10/site-packages/transformers/trainer.py:2029, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval) 2026 self.state.epoch = epoch + (step + 1 + steps_skipped) / steps_in_epoch 2027 self.control = self.callback_handler.on_step_end(args, self.state, self.control) -> 2029 self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval) 2030 else: 2031 self.control = self.callback_handler.on_substep_end(args, self.state, self.control) File ~/miniconda3/envs/ma_code_interpreter/lib/python3.10/site-packages/transformers/trainer.py:2423, in Trainer._maybe_log_save_evaluate(self, tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval) 2420 self.lr_scheduler.step(metrics[metric_to_check]) 2422 if self.control.should_save: -> 2423 self._save_checkpoint(model, trial, metrics=metrics) 2424 self.control = self.callback_handler.on_save(self.args, self.state, self.control) File ~/miniconda3/envs/ma_code_interpreter/lib/python3.10/site-packages/transformers/trainer.py:2499, in Trainer._save_checkpoint(self, model, trial, metrics) 2497 else: 2498 staging_output_dir = os.path.join(run_dir, f"tmp-{checkpoint_folder}") -> 2499 self.save_model(staging_output_dir, _internal_call=True) 2501 if not self.args.save_only_model: 2502 # Save optimizer and scheduler 2503 self._save_optimizer_and_scheduler(staging_output_dir) File ~/miniconda3/envs/ma_code_interpreter/lib/python3.10/site-packages/transformers/trainer.py:3016, in Trainer.save_model(self, output_dir, _internal_call) 3013 self.model_wrapped.save_checkpoint(output_dir) 3015 elif self.args.should_save: -> 3016 self._save(output_dir) 3018 # Push to the Hub when `save_model` is called by the user. 3019 if self.args.push_to_hub and not _internal_call: File ~/miniconda3/envs/ma_code_interpreter/lib/python3.10/site-packages/transformers/trainer.py:3089, in Trainer._save(self, output_dir, state_dict) 3087 torch.save(state_dict, os.path.join(output_dir, WEIGHTS_NAME)) 3088 else: -> 3089 self.model.save_pretrained( 3090 output_dir, state_dict=state_dict, safe_serialization=self.args.save_safetensors 3091 ) 3093 if self.tokenizer is not None: 3094 self.tokenizer.save_pretrained(output_dir) File ~/miniconda3/envs/ma_code_interpreter/lib/python3.10/site-packages/peft/peft_model.py:201, in PeftModel.save_pretrained(self, save_directory, safe_serialization, selected_adapters, **kwargs) 198 else: 199 auto_mapping_dict = None --> 201 peft_config.save_pretrained(output_dir, auto_mapping_dict=auto_mapping_dict) 202 peft_config.inference_mode = inference_mode File ~/miniconda3/envs/ma_code_interpreter/lib/python3.10/site-packages/peft/utils/config.py:92, in PeftConfigMixin.save_pretrained(self, save_directory, **kwargs) 90 # save it 91 with open(output_path, "w") as writer: ---> 92 writer.write(json.dumps(output_dict, indent=2, sort_keys=True)) File ~/miniconda3/envs/ma_code_interpreter/lib/python3.10/json/__init__.py:238, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw) 232 if cls is None: 233 cls = JSONEncoder 234 return cls( 235 skipkeys=skipkeys, ensure_ascii=ensure_ascii, 236 check_circular=check_circular, allow_nan=allow_nan, indent=indent, 237 separators=separators, default=default, sort_keys=sort_keys, --> 238 **kw).encode(obj) File ~/miniconda3/envs/ma_code_interpreter/lib/python3.10/json/encoder.py:201, in JSONEncoder.encode(self, o) 199 chunks = self.iterencode(o, _one_shot=True) 200 if not isinstance(chunks, (list, tuple)): --> 201 chunks = list(chunks) 202 return ''.join(chunks) File ~/miniconda3/envs/ma_code_interpreter/lib/python3.10/json/encoder.py:431, in _make_iterencode.<locals>._iterencode(o, _current_indent_level) 429 yield from _iterencode_list(o, _current_indent_level) 430 elif isinstance(o, dict): --> 431 yield from _iterencode_dict(o, _current_indent_level) 432 else: 433 if markers is not None: File ~/miniconda3/envs/ma_code_interpreter/lib/python3.10/json/encoder.py:405, in _make_iterencode.<locals>._iterencode_dict(dct, _current_indent_level) 403 else: 404 chunks = _iterencode(value, _current_indent_level) --> 405 yield from chunks 406 if newline_indent is not None: 407 _current_indent_level -= 1 File ~/miniconda3/envs/ma_code_interpreter/lib/python3.10/json/encoder.py:438, in _make_iterencode.<locals>._iterencode(o, _current_indent_level) 436 raise ValueError("Circular reference detected") 437 markers[markerid] = o --> 438 o = _default(o) 439 yield from _iterencode(o, _current_indent_level) 440 if markers is not None: File ~/miniconda3/envs/ma_code_interpreter/lib/python3.10/json/encoder.py:179, in JSONEncoder.default(self, o) 160 def default(self, o): 161 """Implement this method in a subclass such that it returns 162 a serializable object for ``o``, or calls the base implementation 163 (to raise a ``TypeError``). (...) 177 178 """ --> 179 raise TypeError(f'Object of type {o.__class__.__name__} ' 180 f'is not JSON serializable') TypeError: Object of type set is not JSON serializable
最近GLM
的问题有些多,我重新租一个平台看一下吧。GLM
调试的时候问题就不少,估计是它后续又更改了自己的model
,导致的一系列问题。可以先去看Qwen
或者Deepseek
模型的教程走完流程,这种版本模型死磕的事可以先放放。
from self-llm.
您好,我这边又验证了一下,没有问题:
- 分析了原因,可能是GLM团队在hf上面又更新了模型以及内部model脚本,最新得模型不再与peft适配。这个适配问题需要GLM团队内部后续继续完善新版本的model。我看了一下hf上面官网也没有再保留老版本的GLM3模型,modelscope里面还有比较老版本的模型,可以先下载尝试一下下载链接。所以如果还想有需求我这边可以把稳定的model上传到autodl的镜像,供大家体验。
- btw,本教程的初心是大家掌握hf微调流程,主要是peft的使用方法,而不是model自己内部的方法。我们的教程类似于教大家使用torch,而不是具体某种特种卷积适配torch的算子怎么写。模型没有稳定也不是大家的责任~大家不要灰心,可以先行体验其他教程,如还想体验GLM效果,可私聊进答疑群,我后续会共享稳定版本的autodl镜像。
关于稳定模型问题,我们也在讨论可行的进一步解决方法,可敬候佳音。
from self-llm.
您好,我这边又验证了一下,没有问题:
- 分析了原因,可能是GLM团队在hf上面又更新了模型以及内部model脚本,最新得模型不再与peft适配。这个适配问题需要GLM团队内部后续继续完善新版本的model。我看了一下hf上面官网也没有再保留老版本的GLM3模型,modelscope里面还有比较老版本的模型,可以先下载尝试一下下载链接。所以如果还想有需求我这边可以把稳定的model上传到autodl的镜像,供大家体验。
- btw,本教程的初心是大家掌握hf微调流程,主要是peft的使用方法,而不是model自己内部的方法。我们的教程类似于教大家使用torch,而不是具体某种特种卷积适配torch的算子怎么写。模型没有稳定也不是大家的责任~大家不要灰心,可以先行体验其他教程,如还想体验GLM效果,可私聊进答疑群,我后续会共享稳定版本的autodl镜像。
感谢维护者,谢谢你们的教程。
from self-llm.
Related Issues (20)
- 【非Issues!讨论帖】通过lora微调的qianwen和直接使用system来预设的区别貌似不是很大 HOT 3
- Qwen1.5-7B-Chat vLLM 部署调用-速度测试 hf命令错误 HOT 1
- Qwen1.5-7B Lora微调报错 HOT 1
- 微调出来会有不礼貌或攻击性的言语 HOT 3
- 我在微调LLAMA3的时候出现NotImplementedError: Cannot copy out of meta tensor; no data! HOT 8
- Qwen1.5-7B Lora微调报错:RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn HOT 2
- 【XVERSE-7B-chat WebDemo 部署】报错 torch.cuda.OutOfMemoryError: CUDA out of memory. HOT 2
- llama3 API调用的问题 HOT 1
- 使用 llama3 的 lora 微调报错:NotImplementedError: Cannot copy out of meta tensor; no data! HOT 3
- chatglm3,lora微调报错 HOT 1
- 在纯 CPU 上可以运行吗?比如苹果电脑没有 cuda? HOT 1
- 04-Qwen-7B-Chat Lora 微调时报错 HOT 1
- deepseek-v2部署请求 HOT 2
- 请问有多模态LLM的部署/微调文档吗,未来有相关更新计划吗 HOT 1
- llama3 api报错 HOT 1
- 微调Qwen1.5-0.5b报错 PermissionError: [Errno 13] Permission denied: './output/Qwen1.5\checkpoint-100' HOT 5
- peft微调llama3 8b,从第10补开始loss一直都是0 HOT 5
- 模型微调时报错,报内核版本问题 Detected kernel version 5.4.0 HOT 1
- 如懿传
- Building wheel for flash-attn (setup.py) ... - 卡住了。 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from self-llm.