Comments (8)
可以先设置save_strategy=5
,看一下输出的路径在哪里。curl是指加载你自己lora后的模型吗?这个需要你合并模型后将相应部分推到modelscope或者hf上才可以的。
from self-llm.
好的,我再试一下。之前试过调用save_pretrained()一直报错,提示不是json格式。
curl指的是“ChatGLM3-6B FastApi 部署调用”这一章里面介绍的办法,不过这个应该不是大问题,主要是前一步没有解决。
感谢回复!
from self-llm.
可以先设置
save_strategy=5
,看一下输出的路径在哪里。curl是指加载你自己lora后的模型吗?这个需要你合并模型后将相应部分推到modelscope或者hf上才可以的。
现在这个参数改为save_strategy='epoch'了,不过加上去之后依然报错,报错信息如下。感觉像是版本的问题,不过python包我都是按照项目示例给的版本安装的。
Traceback (most recent call last):
File "train.py", line 79, in
trainer.train()
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1944, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2302, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2378, in _save_checkpoint
self.save_model(staging_output_dir, _internal_call=True)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2886, in save_model
self._save(output_dir)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2958, in _save
self.model.save_pretrained(
File "/root/miniconda3/lib/python3.8/site-packages/peft/peft_model.py", line 201, in save_pretrained
peft_config.save_pretrained(output_dir, auto_mapping_dict=auto_mapping_dict)
File "/root/miniconda3/lib/python3.8/site-packages/peft/utils/config.py", line 92, in save_pretrained
writer.write(json.dumps(output_dict, indent=2, sort_keys=True))
File "/root/miniconda3/lib/python3.8/json/init.py", line 234, in dumps
return cls(
File "/root/miniconda3/lib/python3.8/json/encoder.py", line 201, in encode
chunks = list(chunks)
File "/root/miniconda3/lib/python3.8/json/encoder.py", line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/root/miniconda3/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/root/miniconda3/lib/python3.8/json/encoder.py", line 438, in _iterencode
o = _default(o)
File "/root/miniconda3/lib/python3.8/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.class.name} '
TypeError: Object of type set is not JSON serializable
100%|██████████| 466/466 [07:32<00:00, 1.03it/s]
from self-llm.
@waynetest2024 我想知道您微调多大的模型,多少数据大概用了多久? 我1W的训练数据,LoRA微调qwen1.5-32b-chat在A6000上慢的要死....batch我设置的16,一个batch就恨不得一分钟
from self-llm.
@waynetest2024 我想知道您微调多大的模型,多少数据大概用了多久? 我1W的训练数据,LoRA微调qwen1.5-32b-chat在A6000上慢的要死....batch我设置的16,一个batch就恨不得一分钟
就是demo里的模型和数据,chatglm3-6b、huanhuan.json,4090上几分钟跑一趟吧。我只是熟悉下基本流程,要求比较低
from self-llm.
现在这个参数改为save_strategy='epoch'了,不过加上去之后依然报错,报错信息如下。感觉像是版本的问题,不过python包我都是按照项目示例给的版本安装的。
您好,测试的话不用设置为epoch的,直接设置为iter,然后每5个iter就保存一下,看看是否正常。如果实在解决不了,后续我会创一个没问题的环境推到autodl上,更新后附到repo的相关链接中。
from self-llm.
可以先设置
save_strategy=5
,看一下输出的路径在哪里。curl是指加载你自己lora后的模型吗?这个需要你合并模型后将相应部分推到modelscope或者hf上才可以的。现在这个参数改为save_strategy='epoch'了,不过加上去之后依然报错,报错信息如下。感觉像是版本的问题,不过python包我都是按照项目示例给的版本安装的。 Traceback (most recent call last): File "train.py", line 79, in trainer.train() File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1944, in _inner_training_loop self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2302, in _maybe_log_save_evaluate self._save_checkpoint(model, trial, metrics=metrics) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2378, in _save_checkpoint self.save_model(staging_output_dir, _internal_call=True) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2886, in save_model self._save(output_dir) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2958, in _save self.model.save_pretrained( File "/root/miniconda3/lib/python3.8/site-packages/peft/peft_model.py", line 201, in save_pretrained peft_config.save_pretrained(output_dir, auto_mapping_dict=auto_mapping_dict) File "/root/miniconda3/lib/python3.8/site-packages/peft/utils/config.py", line 92, in save_pretrained writer.write(json.dumps(output_dict, indent=2, sort_keys=True)) File "/root/miniconda3/lib/python3.8/json/init.py", line 234, in dumps return cls( File "/root/miniconda3/lib/python3.8/json/encoder.py", line 201, in encode chunks = list(chunks) File "/root/miniconda3/lib/python3.8/json/encoder.py", line 431, in _iterencode yield from _iterencode_dict(o, _current_indent_level) File "/root/miniconda3/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/root/miniconda3/lib/python3.8/json/encoder.py", line 438, in _iterencode o = _default(o) File "/root/miniconda3/lib/python3.8/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.class.name} ' TypeError: Object of type set is not JSON serializable 100%|██████████| 466/466 [07:32<00:00, 1.03it/s]
还有一个解决思路,使用提供的ipynb进行训练,然后在训练结束后手动保存model的权重,看下保存路径在哪里。
from self-llm.
现在这个参数改为save_strategy='epoch'了,不过加上去之后依然报错,报错信息如下。感觉像是版本的问题,不过python包我都是按照项目示例给的版本安装的。
您好,测试的话不用设置为epoch的,直接设置为iter,然后每5个iter就保存一下,看看是否正常。如果实在解决不了,后续我会创一个没问题的环境推到autodl上,更新后附到repo的相关链接中。
哦哦,但是设置save_strategy=5会直接报错
from self-llm.
Related Issues (20)
- 【非Issues!讨论帖】通过lora微调的qianwen和直接使用system来预设的区别貌似不是很大 HOT 3
- Qwen1.5-7B-Chat vLLM 部署调用-速度测试 hf命令错误 HOT 1
- Qwen1.5-7B Lora微调报错 HOT 1
- 微调出来会有不礼貌或攻击性的言语 HOT 3
- 我在微调LLAMA3的时候出现NotImplementedError: Cannot copy out of meta tensor; no data! HOT 8
- Qwen1.5-7B Lora微调报错:RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn HOT 2
- 【XVERSE-7B-chat WebDemo 部署】报错 torch.cuda.OutOfMemoryError: CUDA out of memory. HOT 2
- llama3 API调用的问题 HOT 1
- 使用 llama3 的 lora 微调报错:NotImplementedError: Cannot copy out of meta tensor; no data! HOT 3
- chatglm3,lora微调报错 HOT 1
- 在纯 CPU 上可以运行吗?比如苹果电脑没有 cuda? HOT 1
- 04-Qwen-7B-Chat Lora 微调时报错 HOT 1
- deepseek-v2部署请求 HOT 2
- 请问有多模态LLM的部署/微调文档吗,未来有相关更新计划吗 HOT 1
- llama3 api报错 HOT 1
- 微调Qwen1.5-0.5b报错 PermissionError: [Errno 13] Permission denied: './output/Qwen1.5\checkpoint-100' HOT 5
- peft微调llama3 8b,从第10补开始loss一直都是0 HOT 5
- 模型微调时报错,报内核版本问题 Detected kernel version 5.4.0 HOT 1
- InternLM2 缺少包 HOT 2
- Building wheel for flash-attn (setup.py) ... - 卡住了。 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from self-llm.