yangjianxin1 / firefly-llama2-chinese Goto Github PK

View Code? Open in Web Editor NEW

369.0 11.0 24.0 2.07 MB

Firefly中文LLaMA-2大模型，支持增量预训练Baichuan2、Llama2、Llama、Falcon、Qwen、Baichuan、InternLM、Bloom等大模型

Python 100.00%

firefly llama llama-2 llama2 llm baichuan baichuan-13b bloom chatglm falcon

firefly-llama2-chinese's Introduction

Hi there 👋, I'm Yang Jianxin

I'm a NLPer interested in Large Language Model and graduated from SYSU with a master's degree.

In my free time, I like to write technical blogs on [Wechat Official Accounts: YeungNLP] and [Zhihu: 红雨瓢泼]

🔭 Experiences:

Shopee, responsible for building NLP algorithm ability about Customer Service. (from 2022-04 to now)
Tencent, responsible for building NLP algorithm ability about Product Understanding. (from 2021-06 to 2022-04)
Alibaba, Internship at Alibaba (from 2020-06 to 2020-09).

⚙ Here are some my public projects:

Project	Description	Code
Firefly	One-stop training for LLMs. Some achievements: 1. firefly-llama2-13b ranked 3rd among all 13B models on Open LLM Leaderboard, only 0.5 points less than 1st. 2. firefly-llama-30b ranked 10th among all 30B models on Open LLM Leaderboard trained with single V100. 3. firefly-baichuan-13b achieves over 1.63 million downloads. 4. firefly-qwen1.5-en-7b-dpo improves 7.21 points compared with the official chat model. 5. firefly-gemma-7b improves 9.37 points compared with the official chat model.
GPT2-chitchat	Chinese GPT2 for chitchat
Firefly-LLaMA2-Chinese	Chinese Llama2 with efficient and effective training method.
LongQLoRA	Efficient and Effective method for extending context length of Llama2 to 8192 with single V100. Technical Report
CPM	Chinese composition model based on CPM
CLIP-Chinese	Chinese CLIP model trained with 1.4 million image-text pairs
ClipCap-Chinese	Chinese image caption model based on clip and mengzi
OFA-Chinese	Chinese multi-modal unified pre-training model
LLMPruner	Prune vocabulary of LLMs to save memory in training.

📁 Here are some my technical blogs:

firefly-llama2-chinese's People

Contributors

Stargazers

Watchers

firefly-llama2-chinese's Issues

torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Error operation not supported at line 351 in file /home/tim/git/bitsandbytes/csrc/pythonInterface.c
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 31779) of binary: /root/miniconda3/envs/chatglm_ft/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/chatglm_ft/bin/torchrun", line 8, in
sys.exit(main())
File "/root/miniconda3/envs/chatglm_ft/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/root/miniconda3/envs/chatglm_ft/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/root/miniconda3/envs/chatglm_ft/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/root/miniconda3/envs/chatglm_ft/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/chatglm_ft/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

CUDA:11.7
CentOS Linux release 7.7.1908 (Core)

请问这个Frefly的中文LLama2使用的是Firefly项目中 qlora 文件夹中的哪个配置json呢

我想基于流萤这个中文LLama2微调一个对话模型，我只能执行qlora，但是我不知道我该使用Firefly中的 qlora 中的哪个配置文件，
是 llama2-sft-qlora.json 这个文件吗？

baichaun2-13b增量预训练loss为0

作者你好，我使用baichuan2-13b做增量cpt时候loss一直是0.
我使用自己的数据集或是CNEWsum.jsonl都是0.

AttributeError: 'NoneType' object has no attribute 'get'

torchrun --nproc_per_node=1 train.py --train_args_file train_args/Glm.yaml
Traceback (most recent call last):
File "/home/yierde/anaconda3/envs/tn/bin/torchrun", line 8, in
sys.exit(main())
^^^^^^
File "/home/yierde/anaconda3/envs/tn/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/yierde/anaconda3/envs/tn/lib/python3.11/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/home/yierde/anaconda3/envs/tn/lib/python3.11/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/home/yierde/anaconda3/envs/tn/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yierde/anaconda3/envs/tn/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 237, in launch_agent
result = agent.run()
^^^^^^^^^^^
File "/home/yierde/anaconda3/envs/tn/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
result = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/yierde/anaconda3/envs/tn/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run
result = self._invoke_run(role)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/yierde/anaconda3/envs/tn/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 844, in _invoke_run
self._initialize_workers(self._worker_group)
File "/home/yierde/anaconda3/envs/tn/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
result = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/yierde/anaconda3/envs/tn/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 681, in _initialize_workers
worker_ids = self._start_workers(worker_group)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yierde/anaconda3/envs/tn/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
result = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/yierde/anaconda3/envs/tn/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/local_elastic_agent.py", line 271, in _start_workers
self._pcontext = start_processes(
^^^^^^^^^^^^^^^^
File "/home/yierde/anaconda3/envs/tn/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/init.py", line 207, in start_processes
redirs = to_map(redirects, nprocs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yierde/anaconda3/envs/tn/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 162, in to_map
map[i] = val_or_map.get(i, Std.NONE)
^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'

怎么解决它？我没办法跑起来。。。

后续有没有考虑支持 flash-attention 的训练？

微调模型保存问题

@yangjianxin1 ，您好，在调用torchrun --nproc_per_node={num_gpus} train.py --train_args_file train_args/llama2-13b-ext.yaml这个命令之后，全量模型微调运行结束，为什么没有保存微调后的模型在output文件夹下？output文件夹下只有一些训练参数文件，没有模型文件？

关于训练细节

您好，请问在指令微调时，验证集是怎么构建的？大概多大？

数据处理模块问题，train_texts长度为1

您好，咨询以下数据处理模块的问题。我的pt数据路径下共有5个txt文件，在加载阶段也都是可以正常加载，如下所示：
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:13<00:00, 2.68s/it]
2024-01-22 17:31:05.728 | INFO | component.dataset:load_dataset:120 - Total num of training text: 5
2024-01-22 17:31:05.728 | INFO | component.dataset:load_dataset:123 - Start tokenizing data ...
0%| | 0/1 [00:00<?, ?it/s]
可以看到加载完数据后开始tokenizing的时候，总数据才是1。我看了下component的dataset.py，在load_datasets里打印了train_texts的长度，结果就是1，也就是说把数据全部添加到train_texts列表的时候，多了一层[]，这样的话，在for i in tqdm(range(0, len(train_texts), self.tokenize_batch)) tokenizing时步长是不是不对？而且我测试的时候这一步经常会爆内存（内存320G），看起来是循环出问题了。不清楚为什么会多一层列表，是哪个环节读取txt的时候出的问题呢。
如果我把改为load_datasets里的代码改为train_texts = train_texts[0]，可以看到tqdm显示数据总量似乎正确，但这样是会正常处理数据吗？希望可以得到解答。

yangjianxin1 / firefly-llama2-chinese Goto Github PK

firefly-llama2-chinese's Introduction

Hi there 👋, I'm Yang Jianxin

firefly-llama2-chinese's People

Contributors

Stargazers

Watchers

Forkers

firefly-llama2-chinese's Issues

Recommend Projects

Recommend Topics

Recommend Org