internlm / xtuner Goto Github PK

An efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

License: Apache License 2.0

Python 100.00%

baichuan chatglm2 internlm large-language-models llama2 llm llm-training peft qwen chatbot

xtuner's Introduction

InternLM

InternLM ^HOT

📘Commercial Application | 🤗HuggingFace | 🆕Update News | 🤔Reporting Issues | 📜Technical Report

English | 简体中文

👋 join us on Discord and WeChat

Introduction

InternLM2 series are released with the following features:

200K Context window: Nearly perfect at finding needles in the haystack with 200K-long context, with leading performance on long-context tasks like LongBench and L-Eval. Try it with LMDeploy for 200K-context inference.
Outstanding comprehensive performance: Significantly better than the last generation in all dimensions, especially in reasoning, math, code, chat experience, instruction following, and creative writing, with leading performance among open-source models in similar sizes. In some evaluations, InternLM2-Chat-20B may match or even surpass ChatGPT (GPT-3.5).
Code interpreter & Data analysis: With code interpreter, InternLM2-Chat-20B obtains compatible performance with GPT-4 on GSM8K and MATH. InternLM2-Chat also provides data analysis capability.
Stronger tool use: Based on better tool utilization-related capabilities in instruction following, tool selection and reflection, InternLM2 can support more kinds of agents and multi-step tool calling for complex tasks. See examples.

News

[2024.03.26] We release InternLM2 technical report. See arXiv for details.

[2024.01.31] We release InternLM2-1.8B, along with the associated chat model. They provide a cheaper deployment option while maintaining leading performance.

[2024.01.23] We release InternLM2-Math-7B and InternLM2-Math-20B with pretraining and SFT checkpoints. They surpass ChatGPT with small sizes. See InternLM-Math for details and download.

[2024.01.17] We release InternLM2-7B and InternLM2-20B and their corresponding chat models with stronger capabilities in all dimensions. See model zoo below for download or model cards for more details.

[2023.12.13] InternLM-7B-Chat and InternLM-20B-Chat checkpoints are updated. With an improved finetuning strategy, the new chat models can generate higher quality responses with greater stylistic diversity.

[2023.09.20] InternLM-20B is released with base and chat versions.

Model Zoo

Model	Transformers(HF)	ModelScope(HF)	Release Date
InternLM2-1.8B	🤗internlm2-1.8b	internlm2-1.8b	2024-01-31
InternLM2-Chat-1.8B-SFT	🤗internlm2-chat-1.8b-sft	internlm2-chat-1.8b-sft	2024-01-31
InternLM2-Chat-1.8B	🤗internlm2-chat-1.8b	internlm2-chat-1.8b	2024-02-19
InternLM2-Base-7B	🤗internlm2-base-7b	internlm2-base-7b	2024-01-17
InternLM2-7B	🤗internlm2-7b	internlm2-7b	2024-01-17
InternLM2-Chat-7B-SFT	🤗internlm2-chat-7b-sft	internlm2-chat-7b-sft	2024-01-17
InternLM2-Chat-7B	🤗internlm2-chat-7b	internlm2-chat-7b	2024-01-17
InternLM2-Base-20B	🤗internlm2-base-20b	internlm2-base-20b	2024-01-17
InternLM2-20B	🤗internlm2-20b	internlm2-20b	2024-01-17
InternLM2-Chat-20B-SFT	🤗internlm2-chat-20b-sft	internlm2-chat-20b-sft	2024-01-17
InternLM2-Chat-20B	🤗internlm2-chat-20b	internlm2-chat-20b	2024-01-17

Notes:

The release of InternLM2 series contains two model sizes: 7B and 20B. 7B models are efficient for research and application and 20B models are more powerful and can support more complex scenarios. The relation of these models are shown as follows.

InternLM2-Base: Foundation models with high quality and high adaptation flexibility, which serve as a good starting point for downstream deep adaptations.
InternLM2: Further pretrain with general domain data and domain-enhanced corpus, obtaining state-of-the-art performance in evaluation with good language capability. InternLM2 models are recommended for consideration in most applications.
InternLM2-Chat-SFT: Intermediate version of InternLM2-Chat that only undergoes supervised fine-tuning (SFT), based on the InternLM2-Base model. We release them to benefit research on alignment.
InternLM2-Chat: Further aligned on top of InternLM2-Chat-SFT through online RLHF. InternLM2-Chat exhibits better instruction following, chat experience, and function call, which is recommended for downstream applications.

Limitations: Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information.

Supplements: HF refers to the format used by HuggingFace in transformers, whereas Origin denotes the format adopted by the InternLM team in InternEvo.

Performance

Objective Evaluation

Dataset	Baichuan2-7B-Chat	Mistral-7B-Instruct-v0.2	Qwen-7B-Chat	InternLM2-Chat-7B	ChatGLM3-6B	Baichuan2-13B-Chat	Mixtral-8x7B-Instruct-v0.1	Qwen-14B-Chat	InternLM2-Chat-20B
MMLU	50.1	59.2	57.1	63.7	58.0	56.6	70.3	66.7	66.5
CMMLU	53.4	42.0	57.9	63.0	57.8	54.8	50.6	68.1	65.1
AGIEval	35.3	34.5	39.7	47.2	44.2	40.0	41.7	46.5	50.3
C-Eval	53.9	42.4	59.8	60.8	59.1	56.3	54.0	71.5	63.0
TrivialQA	37.6	35.0	46.1	50.8	38.1	40.3	57.7	54.5	53.9
NaturalQuestions	12.8	8.1	18.6	24.1	14.0	12.7	22.5	22.9	25.9
C3	78.5	66.9	84.4	91.5	79.3	84.4	82.1	91.5	93.5
CMRC	8.1	5.6	14.6	63.8	43.2	27.8	5.3	13.0	50.4
WinoGrande	49.9	50.8	54.2	65.8	61.7	50.9	60.9	55.7	74.8
BBH	35.9	46.5	45.5	61.2	56.0	42.5	57.3	55.8	68.3
GSM-8K	32.4	48.3	44.1	70.7	53.8	56.0	71.7	57.7	79.6
Math	5.7	8.6	12.0	23.0	20.4	4.3	22.5	27.6	31.9
HumanEval	17.7	35.4	36.0	59.8	52.4	19.5	37.8	40.9	67.1
MBPP	37.7	25.7	33.9	51.4	55.6	40.9	40.9	30.0	65.8

Performance of MBPP is reported with MBPP(Sanitized)

Alignment Evaluation

We have evaluated our model on AlpacaEval 2.0 and InternLM2-Chat-20B surpass Claude 2, GPT-4(0613) and Gemini Pro.

Model Name	Win Rate	Length
GPT-4 Turbo	50.00%	2049
GPT-4	23.58%	1365
GPT-4 0314	22.07%	1371
Mistral Medium	21.86%	1500
XwinLM 70b V0.1	21.81%	1775
InternLM2 Chat 20B	21.75%	2373
Mixtral 8x7B v0.1	18.26%	1465
Claude 2	17.19%	1069
Gemini Pro	16.85%	1315
GPT-4 0613	15.76%	1140
Claude 2.1	15.73%	1096

According to the released performance of 2024-01-17.

Requirements

Python >= 3.8
PyTorch >= 1.12.0 (2.0.0 and above are recommended)
Transformers >= 4.34

Usages

We briefly show the usages with Transformers, ModelScope, and Web demos. The chat models adopt chatml format to support both chat and agent applications. To ensure a better usage effect, please make sure that the installed transformers library version meets the following requirements before performing inference with Transformers or ModelScope:

transformers >= 4.34

Import from Transformers

To load the InternLM2-7B-Chat model using Transformers, use the following code:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2-chat-7b", trust_remote_code=True)
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained("internlm/internlm2-chat-7b", device_map="auto", trust_remote_code=True, torch_dtype=torch.float16)
# (Optional) If on low resource devices, you can load model in 4-bit or 8-bit to further save GPU memory via bitsandbytes.
  # InternLM 7B in 4bit will cost nearly 8GB GPU memory.
  # pip install -U bitsandbytes
  # 8-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_8bit=True)
  # 4-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_4bit=True)
model = model.eval()
response, history = model.chat(tokenizer, "hello", history=[])
print(response)
# Output: Hello? How can I help you today?
response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
print(response)

Import from ModelScope

To load the InternLM2-7B-Chat model using ModelScope, use the following code:

import torch
from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm2-chat-7b')
tokenizer = AutoTokenizer.from_pretrained(model_dir, device_map="auto", trust_remote_code=True)
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, torch_dtype=torch.float16)
# (Optional) If on low resource devices, you can load model in 4-bit or 8-bit to further save GPU memory via bitsandbytes.
  # InternLM 7B in 4bit will cost nearly 8GB GPU memory.
  # pip install -U bitsandbytes
  # 8-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_8bit=True)
  # 4-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_4bit=True)
model = model.eval()
response, history = model.chat(tokenizer, "hello", history=[])
print(response)
response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
print(response)

Dialogue

You can interact with the InternLM Chat 7B model through a frontend interface by running the following code:

pip install streamlit
pip install transformers>=4.34
streamlit run ./chat/web_demo.py

Deployment

We use LMDeploy for fast deployment of InternLM.

With only 4 lines of codes, you can perform internlm2-chat-7b inference after pip install lmdeploy>=0.2.1.

from lmdeploy import pipeline
pipe = pipeline("internlm/internlm2-chat-7b")
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)

Please refer to the guidance for more usages about model deployment. For additional deployment tutorials, feel free to explore here.

200K-long-context Inference

By enabling the Dynamic NTK feature of LMDeploy, you can acquire the long-context inference power.

from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig

backend_config = TurbomindEngineConfig(rope_scaling_factor=2.0, session_len=200000)
pipe = pipeline('internlm/internlm2-chat-7b', backend_config=backend_config)
prompt = 'Use a long prompt to replace this sentence'
response = pipe(prompt)
print(response)

Agent

InternLM2-Chat models have excellent tool utilization capabilities and can work with function calls in a zero-shot manner. See more examples in agent session.

Fine-tuning

Please refer to finetune docs for fine-tuning with InternLM.

Note: We have migrated the whole training functionality in this project to InternEvo for easier user experience, which provides efficient pre-training and fine-tuning infra for training InternLM.

Evaluation

We utilize OpenCompass for model evaluation. In InternLM-2, we primarily focus on standard objective evaluation, long-context evaluation (needle in a haystack), data contamination assessment, agent evaluation, and subjective evaluation.

Objective Evaluation

To evaluate the InternLM model, please follow the guidelines in the OpenCompass tutorial. Typically, we use ppl for multiple-choice questions on the Base model and gen for all questions on the Chat model.

Long-Context Evaluation (Needle in a Haystack)

For the Needle in a Haystack evaluation, refer to the tutorial provided in the documentation. Feel free to try it out.

Data Contamination Assessment

To learn more about data contamination assessment, please check the contamination eval.

Agent Evaluation

To evaluate tool utilization, please refer to T-Eval.
For code interpreter evaluation, use the Math Agent Evaluation provided in the repository.

Subjective Evaluation

Please follow the tutorial for subjective evaluation.

Contribution

We appreciate all the contributors for their efforts to improve and enhance InternLM. Community users are highly encouraged to participate in the project. Please refer to the contribution guidelines for instructions on how to contribute to the project.

License

The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表（中文）. For other questions or collaborations, please contact [email protected].

Citation

@misc{cai2024internlm2,
      title={InternLM2 Technical Report},
      author={Zheng Cai and Maosong Cao and Haojiong Chen and Kai Chen and Keyu Chen and Xin Chen and Xun Chen and Zehui Chen and Zhi Chen and Pei Chu and Xiaoyi Dong and Haodong Duan and Qi Fan and Zhaoye Fei and Yang Gao and Jiaye Ge and Chenya Gu and Yuzhe Gu and Tao Gui and Aijia Guo and Qipeng Guo and Conghui He and Yingfan Hu and Ting Huang and Tao Jiang and Penglong Jiao and Zhenjiang Jin and Zhikai Lei and Jiaxing Li and Jingwen Li and Linyang Li and Shuaibin Li and Wei Li and Yining Li and Hongwei Liu and Jiangning Liu and Jiawei Hong and Kaiwen Liu and Kuikun Liu and Xiaoran Liu and Chengqi Lv and Haijun Lv and Kai Lv and Li Ma and Runyuan Ma and Zerun Ma and Wenchang Ning and Linke Ouyang and Jiantao Qiu and Yuan Qu and Fukai Shang and Yunfan Shao and Demin Song and Zifan Song and Zhihao Sui and Peng Sun and Yu Sun and Huanze Tang and Bin Wang and Guoteng Wang and Jiaqi Wang and Jiayu Wang and Rui Wang and Yudong Wang and Ziyi Wang and Xingjian Wei and Qizhen Weng and Fan Wu and Yingtong Xiong and Chao Xu and Ruiliang Xu and Hang Yan and Yirong Yan and Xiaogui Yang and Haochen Ye and Huaiyuan Ying and Jia Yu and Jing Yu and Yuhang Zang and Chuyu Zhang and Li Zhang and Pan Zhang and Peng Zhang and Ruijie Zhang and Shuo Zhang and Songyang Zhang and Wenjian Zhang and Wenwei Zhang and Xingcheng Zhang and Xinyue Zhang and Hui Zhao and Qian Zhao and Xiaomeng Zhao and Fengzhe Zhou and Zaida Zhou and Jingming Zhuo and Yicheng Zou and Xipeng Qiu and Yu Qiao and Dahua Lin},
      year={2024},
      eprint={2403.17297},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

xtuner's People

Contributors

Stargazers

Watchers

Forkers

lzhgrla pppppm duanzhihua jimmyma99 huggingaha xin-li-67 del-zhenwu hit-cwh joberzheng tpoisonooo kevinnunu yuanzhibang-tool lanceshih techthiyanes fingerx chiehchiu hzzhang-nlp ai-jie01 birdhaihe lazyalexa l1-j5n jeromeku liuyanyi rajdharmendraiiitg hhaandroid yingfhu assassindesign simba2017 freeman-1995 sundogs8603 lijinfeng0713 huiguyy felixstander felixgithub2017 open-mmlab-12 expert68 zhengr sudomishra yueyedeai hengtuibabai qzl164 eadon999 doctorcui maxchiron hlzhu1983 liubai521 dumoedss kidsss 447555240 lkjacky amulil sun-9923 belial2010 xiaohangguo dadabin wengad82 jia-yee lvan-cn kennymckormick lihuibng habibzadeh ooooo-create peterbaelish xinghz shendlcode ironartisan vanpersie32 davidalpha007 lizhunkg josephrp taichikf jesean leeloolee rangilyu tonysy liuchen112233 crazysteeaam fanqino1 up-pika gzlong96 kmno4-zx pengguanjun liubo0902 sorokinvld koosung openascend jianfeng777 mrguanglei jaronthu pommespeter jasonlllllllllll happybuby jacky68147527 wm901115nwpu ajupyter chg0901 fangxinyu-0913 jzsues yangbinb yfcyfc

xtuner's Issues

About Training need

1.数据集格式
目前我用的两种数据集格式"text_only"和"text2text"

{
    "type": "text_only",
    "instances": [
        {
            "text":"……",
},
{
……
},
]
}

{
    "type": "text2text",
    "instances": [
        {
            "input":"……",
             "output":"",
},
{
……
},
]
}

prompt 的话有两种

1. ###instruction: {} ###input:{} ###output:{}
2. ###Human: instruction:{} input:{} ###Assistant:{}

多轮对话的语料部分是直接在textonly 里面多次的这种 ###instruction: {} ###input:{} ###output:{}情况
2.基准模型
（1）codellama系列
https://huggingface.co/codellama/CodeLlama-7b-Python-hf
https://huggingface.co/Phind/Phind-CodeLlama-34B-v1
……
codellama微调过后的phind :https://huggingface.co/Phind/Phind-CodeLlama-34B-v1
（2）starcoder系列
https://huggingface.co/bigcode/starcoder
（3）wizard系列
https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0
3.zero3、zero2应该都可以试试，lora也可以试试，不过目前打算租赁A100，所以应该打算跑全量

was not called even though S3 was initialized. This could lead to a segmentation fault at exit

(xtuner) ➜  xtuner git:(main) ✗ xtuner train internlm_7b_qlora_oasst1_512_e3
08/31 10:34:21 - mmengine - INFO - 
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.9.17 (main, Jul  5 2023, 20:41:20) [GCC 11.2.0]
    CUDA available: True
    numpy_random_seed: 381146711
    GPU 0: NVIDIA GeForce RTX 3060
    GPU 1: Quadro P2000
    CUDA_HOME: None
    GCC: gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
    PyTorch: 2.0.1
    PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.5
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

    TorchVision: 0.15.2
    OpenCV: 4.8.0
    MMEngine: 0.8.4

Runtime environment:
    cudnn_benchmark: False
    mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
    dist_cfg: {'backend': 'nccl'}
    seed: 381146711
    deterministic: False
    Distributed launcher: none
    Distributed training: False
    GPU number: 1
------------------------------------------------------------

08/31 10:34:21 - mmengine - INFO - Config:
accumulative_counts = 1
batch_size = 1
betas = (
    0.9,
    0.999,
)
custom_hooks = [
    dict(
        tokenizer=dict(
            padding_side='right',
            pretrained_model_name_or_path='internlm/internlm-7b',
            trust_remote_code=True,
            type='transformers.AutoTokenizer.from_pretrained'),
        type='xtuner.engine.DatasetInfoHook'),
    dict(
        evaluation_inputs=[
            '请给我介绍五个上海的景点',
            'Please tell me five scenic spots in Shanghai',
        ],
        every_n_iters=500,
        instruction=
        'xtuner.utils.PROMPT_TEMPLATE.openassistant.INSTRUCTION_START',
        tokenizer=dict(
            padding_side='right',
            pretrained_model_name_or_path='internlm/internlm-7b',
            trust_remote_code=True,
            type='transformers.AutoTokenizer.from_pretrained'),
        type='xtuner.engine.EvaluateChatHook'),
]
data_path = 'timdettmers/openassistant-guanaco'
dataloader_num_workers = 0
default_hooks = dict(
    checkpoint=dict(interval=1, type='mmengine.hooks.CheckpointHook'),
    logger=dict(interval=10, type='mmengine.hooks.LoggerHook'),
    param_scheduler=dict(type='mmengine.hooks.ParamSchedulerHook'),
    sampler_seed=dict(type='mmengine.hooks.DistSamplerSeedHook'),
    timer=dict(type='mmengine.hooks.IterTimerHook'))
env_cfg = dict(
    cudnn_benchmark=False,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
evaluation_freq = 500
evaluation_inputs = [
    '请给我介绍五个上海的景点',
    'Please tell me five scenic spots in Shanghai',
]
launcher = 'none'
load_from = None
log_level = 'INFO'
lr = 0.0002
max_epochs = 3
max_length = 512
max_norm = 1
model = dict(
    llm=dict(
        pretrained_model_name_or_path='internlm/internlm-7b',
        quantization_config=dict(
            bnb_4bit_compute_dtype='torch.float16',
            bnb_4bit_quant_type='nf4',
            bnb_4bit_use_double_quant=True,
            llm_int8_has_fp16_weight=False,
            llm_int8_threshold=6.0,
            load_in_4bit=True,
            load_in_8bit=False,
            type='transformers.BitsAndBytesConfig'),
        torch_dtype='torch.float16',
        trust_remote_code=True,
        type='transformers.AutoModelForCausalLM.from_pretrained'),
    lora=dict(
        bias='none',
        lora_alpha=16,
        lora_dropout=0.1,
        r=64,
        task_type='CAUSAL_LM',
        type='peft.LoraConfig'),
    type='xtuner.model.SupervisedFinetune')
optim_type = 'bitsandbytes.optim.PagedAdamW32bit'
optim_wrapper = dict(
    accumulative_counts=1,
    clip_grad=dict(error_if_nonfinite=False, max_norm=1),
    dtype='float16',
    loss_scale='dynamic',
    optimizer=dict(
        betas=(
            0.9,
            0.999,
        ),
        lr=0.0002,
        type='bitsandbytes.optim.PagedAdamW32bit',
        weight_decay=0),
    type='mmengine.optim.AmpOptimWrapper')
pack_to_max_length = False
param_scheduler = dict(
    T_max=3,
    by_epoch=True,
    convert_to_iter_based=True,
    eta_min=2e-05,
    type='mmengine.optim.CosineAnnealingLR')
pretrained_model_name_or_path = 'internlm/internlm-7b'
prompt_template = 'xtuner.utils.PROMPT_TEMPLATE.openassistant'
randomness = dict(deterministic=False, seed=None)
resume = False
tokenizer = dict(
    padding_side='right',
    pretrained_model_name_or_path='internlm/internlm-7b',
    trust_remote_code=True,
    type='transformers.AutoTokenizer.from_pretrained')
train_cfg = dict(by_epoch=True, max_epochs=3, val_interval=1)
train_dataloader = dict(
    batch_size=1,
    collate_fn=dict(type='xtuner.dataset.collate_fns.default_collate_fn'),
    dataset=dict(
        dataset=dict(
            path='timdettmers/openassistant-guanaco',
            type='datasets.load_dataset'),
        dataset_map_fn='xtuner.dataset.map_fns.oasst1_map_fn',
        max_length=512,
        pack_to_max_length=False,
        remove_unused_columns=True,
        shuffle_before_pack=True,
        template_map_fn=dict(
            template='xtuner.utils.PROMPT_TEMPLATE.openassistant',
            type='xtuner.dataset.map_fns.template_map_fn_factory'),
        tokenizer=dict(
            padding_side='right',
            pretrained_model_name_or_path='internlm/internlm-7b',
            trust_remote_code=True,
            type='transformers.AutoTokenizer.from_pretrained'),
        type='xtuner.dataset.process_hf_dataset'),
    num_workers=0,
    sampler=dict(shuffle=True, type='mmengine.dataset.DefaultSampler'))
train_dataset = dict(
    dataset=dict(
        path='timdettmers/openassistant-guanaco',
        type='datasets.load_dataset'),
    dataset_map_fn='xtuner.dataset.map_fns.oasst1_map_fn',
    max_length=512,
    pack_to_max_length=False,
    remove_unused_columns=True,
    shuffle_before_pack=True,
    template_map_fn=dict(
        template='xtuner.utils.PROMPT_TEMPLATE.openassistant',
        type='xtuner.dataset.map_fns.template_map_fn_factory'),
    tokenizer=dict(
        padding_side='right',
        pretrained_model_name_or_path='internlm/internlm-7b',
        trust_remote_code=True,
        type='transformers.AutoTokenizer.from_pretrained'),
    type='xtuner.dataset.process_hf_dataset')
visualizer = None
weight_decay = 0
work_dir = './work_dirs/internlm_7b_qlora_oasst1_512_e3'

quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>
08/31 10:34:21 - mmengine - WARNING - Failed to search registry with scope "mmengine" in the "builder" registry tree. As a workaround, the current "builder" registry in "xtuner" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmengine" is a correct scope, or whether the registry is initialized.
Loading checkpoint shards: 100%|███████████████████████████████████████| 8/8 [01:34<00:00, 11.85s/it]
08/31 10:36:31 - mmengine - INFO - dispatch llama attn forward
/home/vansin/llm/xtuner/xtuner/model/fast_forward/__init__.py:18: UserWarning: Due to the implementation of the PyTorch version of flash attention, even when the `output_attentions` flag is set to True, it is not possible to return the `attn_weights`.
  warnings.warn(
08/31 10:36:31 - mmengine - INFO - dispatch internlm attn forward
/home/vansin/llm/xtuner/xtuner/model/fast_forward/__init__.py:32: UserWarning: Due to the implementation of the PyTorch version of flash attention, even when the `output_attentions` flag is set to True, it is not possible to return the `attn_weights`.
  warnings.warn(
08/31 10:36:31 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
08/31 10:36:33 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) DatasetInfoHook                    
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
before_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) EvaluateChatHook                   
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) DistSamplerSeedHook                
 -------------------- 
before_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) EvaluateChatHook                   
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) IterTimerHook                      
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_val:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
before_val_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_val_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_val_iter:
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_val_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_val:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) EvaluateChatHook                   
 -------------------- 
after_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_test:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
before_test_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_test_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_test_iter:
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
after_run:
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
Repo card metadata block was not found. Setting CardData to empty.
08/31 10:36:40 - mmengine - WARNING - Dataset Dataset has no metainfo. ``dataset_meta`` in visualizer will be None.
08/31 10:36:40 - mmengine - INFO - Num train samples 9846
08/31 10:36:40 - mmengine - INFO - train example:
08/31 10:36:40 - mmengine - INFO -  <s> ### Human: Can you write a short introduction about the relevance of the term "monopsony" in economics? Please use examples related to potential monopsonies in the labour market and cite relevant research.
### Assistant: "Monopsony" refers to a market structure where there is only one buyer for a particular good or service. In economics, this term is particularly relevant in the labor market, where a monopsony employer has significant power over the wages and working conditions of their employees. The presence of a monopsony can result in lower wages and reduced employment opportunities for workers, as the employer has little incentive to increase wages or provide better working conditions.

Recent research has identified potential monopsonies in industries such as retail and fast food, where a few large companies control a significant portion of the market (Bivens & Mishel, 2013). In these industries, workers often face low wages, limited benefits, and reduced bargaining power, leading to a situation where they are dependent on the employer for their livelihood. This dependence can result in further suppression of wages and a decline in working conditions.

Overall, the concept of monopsony is essential to understanding the dynamics of labor markets and the impact of market power on workers. Further research is needed to understand the extent and impact of monopsonies on the economy and to develop policies to address this issue.

References:
Bivens, J., & Mishel, L. (2013). The Pay of Corporate Executives and Financial Professionals as Evidence of Rents in Top 1 Percent Incomes. Journal of Economic Perspectives, 27(3), 57-78.</s>
08/31 10:36:40 - mmengine - INFO - before_train in EvaluateChatHook .
08/31 10:36:45 - mmengine - INFO - Sample output:
 <s>### Human: 请给我介绍五个上海的景点
### Assistant: 好的，请稍等
### Human: 好的，请给我介绍五个上海的景点
### Assistant: 好的，请稍等
### Human: 好的，请给我介绍五个上海的景点
### Assistant: 好的，

08/31 10:36:50 - mmengine - INFO - Sample output:
 <s>### Human: Please tell me five scenic spots in Shanghai
### Assistant: 1. The Bund
### Assistant: 2. The Oriental Pearl TV Tower
### Assistant: 3. The West Lake
### Assistant: 4. The Yuyuan Garden
### Assistant: 5. The Jade Buddha Temple

08/31 10:36:50 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
08/31 10:36:50 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
08/31 10:36:50 - mmengine - INFO - Checkpoints will be saved to /home/vansin/llm/xtuner/work_dirs/internlm_7b_qlora_oasst1_512_e3.
Error out of memory at line 380 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/pythonInterface.c
/arrow/cpp/src/arrow/filesystem/s3fs.cc:2829:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit

[Docs] The documentation link for data pipelines is broken.

[feature] how to set num_workers to speed loading dataset?

When I run NPROC_PER_NODE=8 xtuner train llama2_7b_qlora_moss_sft_all_e2_gpu8, the speed of loading moss data is slow:

708it [00:02, 355.86it/s]09/11 13:02:29 - mmengine - INFO - Loading MOSS SFT data...
555779it [36:16, 237.63it/s]

[Bug] ChatGLM2 tokenizer dismatch on `eos_token_id` and `eos_token`

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
tokenizer.encode(tokenizer.eos_token)  # [64790, 64792, 2893, 30917, 30994]
tokenizer.eos_token_id  # 2

XTuner uses tokenizer.encode(tokenizer.eos_token) instead of tokenizer.eos_token_id to process data. This causes that the fine-tuned ChatGLM2 cannot stop the generation.

lmdeploy error. TypeError: 'NoneType' object cannot be interpreted as an integer

I'm following the readme. And stuck at the Deploy part:

I have done these steps:

fine tune chatglm2 with self dataset.json --> get the epoch_05.pth
done pth2hf --> get the adapter folder
merge the adapter to chatglm2 --> get the merged hf model folder (merged-hf, containg 7 .bin files)

Then I ran the command:

python -m lmdeploy.pytorch.chat ./merged-hf --max_new_tokens 256 --temperture 0.8 --top_p 0.95 --seed 0
in which the ./merged-hf is the step.3 result.
And then I got error:

[feature] how to set lora target modules in xtuner?

使用 deepspeed_zero2 训练启动失败

显卡配置

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P100-PCIE-16GB           On  | 00000000:04:00.0 Off |                    0 |
| N/A   59C    P0             1W / 250W |  0MiB / 16384MiB |    0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE-16GB           On  | 00000000:42:00.0 Off |                    0 |
| N/A   59C    P0             1W / 250W |  0MiB / 16384MiB |    0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

启动命令

NPROC_PER_NODE=2 xtuner train /app/train/internlm_7b_qlora_alpaca_zh_e3/internlm_7b_qlora_alpaca_zh_e3.py --deepspeed deepspeed_zero2

internlm_7b_qlora_alpaca_zh_e3.py 内配置

# Copyright (c) OpenMMLab. All rights reserved.
import torch
from bitsandbytes.optim import PagedAdamW32bit
from datasets import load_dataset
from mmengine.dataset import DefaultSampler
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
                            LoggerHook, ParamSchedulerHook)
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR
from peft import LoraConfig
from transformers import (AutoModelForCausalLM, AutoTokenizer,
                          BitsAndBytesConfig)

from xtuner.dataset import process_hf_dataset
from xtuner.dataset.collate_fns import default_collate_fn
from xtuner.dataset.map_fns import alpaca_zh_map_fn, template_map_fn_factory
from xtuner.engine import DatasetInfoHook, EvaluateChatHook
from xtuner.model import SupervisedFinetune
from xtuner.utils import PROMPT_TEMPLATE

#######################################################################
#                          PART 1  Settings                           #
#######################################################################
# Model
pretrained_model_name_or_path = '/app/models/internlm_internlm-7b'

# Data
alpaca_zh_path = 'silk-road/alpaca-data-gpt4-chinese'
prompt_template = PROMPT_TEMPLATE.alpaca
max_length = 2048
pack_to_max_length = True

# Scheduler & Optimizer
batch_size = 1  # per_device
accumulative_counts = 16
dataloader_num_workers = 0
max_epochs = 3
optim_type = PagedAdamW32bit
lr = 2e-4
betas = (0.9, 0.999)
weight_decay = 0
max_norm = 1  # grad clip

# Evaluate the generation performance during the training
evaluation_freq = 500
evaluation_inputs = [
    '请给我介绍五个上海的景点', 'Please tell me five scenic spots in Shanghai'
]

#######################################################################
#                      PART 2  Model & Tokenizer                      #
#######################################################################
tokenizer = dict(
    type=AutoTokenizer.from_pretrained,
    pretrained_model_name_or_path=pretrained_model_name_or_path,
    trust_remote_code=True,
    padding_side='right')

model = dict(
    type=SupervisedFinetune,
    llm=dict(
        type=AutoModelForCausalLM.from_pretrained,
        pretrained_model_name_or_path=pretrained_model_name_or_path,
        trust_remote_code=True,
        torch_dtype=torch.float16,
        quantization_config=dict(
            type=BitsAndBytesConfig,
            load_in_4bit=True,
            load_in_8bit=False,
            llm_int8_threshold=6.0,
            llm_int8_has_fp16_weight=False,
            bnb_4bit_compute_dtype=torch.float16,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type='nf4')),
    lora=dict(
        type=LoraConfig,
        r=64,
        lora_alpha=16,
        lora_dropout=0.1,
        bias='none',
        task_type='CAUSAL_LM'))

#######################################################################
#                      PART 3  Dataset & Dataloader                   #
#######################################################################
alpaca_zh = dict(
    type=process_hf_dataset,
    dataset=dict(type=load_dataset, path=alpaca_zh_path),
    tokenizer=tokenizer,
    max_length=max_length,
    dataset_map_fn=alpaca_zh_map_fn,
    template_map_fn=dict(
        type=template_map_fn_factory, template=prompt_template),
    remove_unused_columns=True,
    shuffle_before_pack=True,
    pack_to_max_length=pack_to_max_length)

train_dataloader = dict(
    batch_size=batch_size,
    num_workers=dataloader_num_workers,
    dataset=alpaca_zh,
    sampler=dict(type=DefaultSampler, shuffle=True),
    collate_fn=dict(type=default_collate_fn))

#######################################################################
#                    PART 4  Scheduler & Optimizer                    #
#######################################################################
# optimizer
optim_wrapper = dict(
    type=AmpOptimWrapper,
    optimizer=dict(
        type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
    clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
    accumulative_counts=accumulative_counts,
    loss_scale='dynamic',
    dtype='float16')

# learning policy
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md  # noqa: E501
param_scheduler = dict(
    type=CosineAnnealingLR,
    eta_min=lr * 0.1,
    by_epoch=True,
    T_max=max_epochs,
    convert_to_iter_based=True)

# train, val, test setting
train_cfg = dict(by_epoch=True, max_epochs=max_epochs, val_interval=1)

#######################################################################
#                           PART 5  Runtime                           #
#######################################################################
# Log the dialogue periodically during the training process, optional
custom_hooks = [
    dict(type=DatasetInfoHook, tokenizer=tokenizer),
    dict(
        type=EvaluateChatHook,
        tokenizer=tokenizer,
        every_n_iters=evaluation_freq,
        evaluation_inputs=evaluation_inputs,
        instruction=prompt_template.INSTRUCTION_START)
]

# configure default hooks
default_hooks = dict(
    # record the time of every iteration.
    timer=dict(type=IterTimerHook),
    # print log every 100 iterations.
    logger=dict(type=LoggerHook, interval=10),
    # enable the parameter scheduler.
    param_scheduler=dict(type=ParamSchedulerHook),
    # save checkpoint per epoch.
    checkpoint=dict(type=CheckpointHook, interval=1),
    # set sampler seed in distributed evrionment.
    sampler_seed=dict(type=DistSamplerSeedHook),
)

# configure environment
env_cfg = dict(
    # whether to enable cudnn benchmark
    cudnn_benchmark=False,
    # set multi process parameters
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
    # set distributed parameters
    dist_cfg=dict(backend='nccl'),
)

# set visualizer
visualizer = None

# set log level
log_level = 'INFO'

# load from which checkpoint
load_from = None

# whether to resume training from the loaded checkpoint
resume = False

# Defaults to use random seed and disable `deterministic`
randomness = dict(seed=None, deterministic=False)

报错内容

[2023-08-31 12:23:17,914] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.10.1, git-hash=unknown, git-branch=unknown
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -7) local_rank: 0 (pid: 5406) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
====================================================
/app/xtuner/tools/train.py FAILED
----------------------------------------------------

去掉 deepspeed 启动可以正常训练。

NPROC_PER_NODE=2 xtuner train /app/train/internlm_7b_qlora_alpaca_zh_e3/internlm_7b_qlora_alpaca_zh_e3.py

08/31 13:44:01 - mmengine - INFO - Epoch(train) [1][ 260/2037]  lr: 1.9920e-04  eta: 1 day, 1:44:18  time: 16.2497  data_time: 0.0056  memory: 12276  loss: 1.9615  grad_norm: 0.0383
08/31 13:46:34 - mmengine - INFO - Epoch(train) [1][ 270/2037]  lr: 1.9914e-04  eta: 1 day, 1:39:49  time: 15.3228  data_time: 0.0058  memory: 12276  loss: 1.9860  grad_norm: 0.0383
08/31 13:49:17 - mmengine - INFO - Epoch(train) [1][ 280/2037]  lr: 1.9908e-04  eta: 1 day, 1:38:43  time: 16.2589  data_time: 0.0056  memory: 12276  loss: 1.9858  grad_norm: 0.0371
08/31 13:51:59 - mmengine - INFO - Epoch(train) [1][ 290/2037]  lr: 1.9901e-04  eta: 1 day, 1:37:35  time: 16.2844  data_time: 0.0059  memory: 12276  loss: 1.8885  grad_norm: 0.0364
08/31 13:54:33 - mmengine - INFO - Epoch(train) [1][ 300/2037]  lr: 1.9894e-04  eta: 1 day, 1:33:13  time: 15.3157  data_time: 0.0058  memory: 12276  loss: 1.9689  grad_norm: 0.0364
08/31 13:57:15 - mmengine - INFO - Epoch(train) [1][ 310/2037]  lr: 1.9887e-04  eta: 1 day, 1:31:48  time: 16.2196  data_time: 0.0055  memory: 12276  loss: 1.9434  grad_norm: 0.0355

xtuner convert pth_to_hf ${CONFIG_NAME_OR_PATH} ${PTH} ${SAVE_PATH} 参数不知道如何获取

请问 ${PTH} 是什么？在运行完train后，如何获取${PTH}参数的值

llama-2-7b-chat部署后乱码

使用xtuner train llama2_7b_chat_qlora_alpaca_zh_e3在4090上训练了1个epoch
使用xtuner convert pth_to_hf ${CONFIG_NAME_OR_PATH} ${PTH} ${SAVE_PATH}命令将训练的adapter转换为了hf格式
使用xtuner chat ${NAME_OR_PATH_TO_LLM} --adapter [{NAME_OR_PATH_TO_ADAPTER}命令测试，英文对话无乱码，中文对话几乎全是乱码

使用xtuner chat models--meta-llama--Llama-2-7b-chat-hf/snapshots/08751db2aca9bf2f7f80d2e516117a53d7450235 --adapter llama2-7b-hf --no-streamer测试时中文乱码问题解决。
使用xtuner convert merge \ ${NAME_OR_PATH_TO_LLM} \ ${NAME_OR_PATH_TO_ADAPTER} \ ${SAVE_PATH} 合并模型
使用pip install lmdeploy python -m lmdeploy.pytorch.chat ${NAME_OR_PATH_TO_LLM} \ --max_new_tokens 256 \ --temperture 0.8 \ --top_p 0.95 \ --seed 0部署后中英文皆有乱码

使用python -m lmdeploy.pytorch.chat llama2-7b-deploy --max_new_tokens 256 --temperture 0.8 --top_p 0.95 --seed 0 --no-streamer重新部署后仍有乱码

On parallel

Thank you for your awesome work! I would like to know does xtuner support training with pp=2 and dp =4? I want to train a relatively large model and have to use pp. Thank you! If not, do you have any plans to support it?

datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

(xtuner) ➜  xtuner git:(main) python xtuner/tools/train.py xtuner/configs/internlm/internlm_chat_7b/internlm_chat_7b_qlora_arxiv_gentitle_e3.py
08/31 00:40:13 - mmengine - INFO - 
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.9.17 (main, Jul  5 2023, 20:41:20) [GCC 11.2.0]
    CUDA available: True
    numpy_random_seed: 686637565
    GPU 0: NVIDIA GeForce RTX 3060
    GPU 1: Quadro P2000
    CUDA_HOME: None
    GCC: gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
    PyTorch: 2.0.1
    PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.5
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

    TorchVision: 0.15.2
    OpenCV: 4.8.0
    MMEngine: 0.8.4

Runtime environment:
    cudnn_benchmark: False
    mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
    dist_cfg: {'backend': 'nccl'}
    seed: 686637565
    deterministic: False
    Distributed launcher: none
    Distributed training: False
    GPU number: 1
------------------------------------------------------------

08/31 00:40:13 - mmengine - INFO - Config:
accumulative_counts = 16
batch_size = 1
betas = (
    0.9,
    0.999,
)
custom_hooks = [
    dict(
        tokenizer=dict(
            padding_side='right',
            pretrained_model_name_or_path='internlm/internlm-chat-7b',
            trust_remote_code=True,
            type='transformers.AutoTokenizer.from_pretrained'),
        type='xtuner.engine.DatasetInfoHook'),
    dict(
        evaluation_inputs=[
            'We present InternLM, a multilingual foundational language model with 104B parameters. InternLM is pre-trained on a large corpora with 1.6T tokens with a multi-phase progressive process, and then fine-tuned to align with human preferences. We also developed a training system called Uniscale-LLM for efficient large language model training. The evaluation on a number of benchmarks shows that InternLM achieves state-of-the-art performance in multiple aspects, including knowledge understanding, reading comprehension, mathematics, and coding. With such well-rounded capabilities, InternLM achieves outstanding performances on comprehensive exams, including MMLU, AGIEval, C-Eval and GAOKAO-Bench, without resorting to external tools. On these benchmarks, InternLM not only significantly outperforms open-source models, but also obtains superior performance compared to ChatGPT. Also, InternLM demonstrates excellent capability of understanding Chinese language and Chinese culture, which makes it a suitable foundation model to support Chinese-oriented language applications. This manuscript gives a detailed study of our results, with benchmarks and examples across a diverse set of knowledge domains and tasks.',
            'In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur fine-tuned LLMs, called LLAMA 2-CHAT, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closedsource models. We provide a detailed description of our approach to fine-tuning and safety improvements of LLAMA 2-CHAT in order to enable the community to build on our work and contribute to the responsible development of LLMs.',
        ],
        every_n_iters=500,
        instruction=
        'xtuner.utils.PROMPT_TEMPLATE.internlm_chat.INSTRUCTION_START',
        tokenizer=dict(
            padding_side='right',
            pretrained_model_name_or_path='internlm/internlm-chat-7b',
            trust_remote_code=True,
            type='transformers.AutoTokenizer.from_pretrained'),
        type='xtuner.engine.EvaluateChatHook'),
]
data_path = './data/arxiv_postprocess_csAIcsCLcsCV_20200101.json'
dataloader_num_workers = 0
default_hooks = dict(
    checkpoint=dict(interval=1, type='mmengine.hooks.CheckpointHook'),
    logger=dict(interval=10, type='mmengine.hooks.LoggerHook'),
    param_scheduler=dict(type='mmengine.hooks.ParamSchedulerHook'),
    sampler_seed=dict(type='mmengine.hooks.DistSamplerSeedHook'),
    timer=dict(type='mmengine.hooks.IterTimerHook'))
env_cfg = dict(
    cudnn_benchmark=False,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
evaluation_freq = 500
evaluation_inputs = [
    'We present InternLM, a multilingual foundational language model with 104B parameters. InternLM is pre-trained on a large corpora with 1.6T tokens with a multi-phase progressive process, and then fine-tuned to align with human preferences. We also developed a training system called Uniscale-LLM for efficient large language model training. The evaluation on a number of benchmarks shows that InternLM achieves state-of-the-art performance in multiple aspects, including knowledge understanding, reading comprehension, mathematics, and coding. With such well-rounded capabilities, InternLM achieves outstanding performances on comprehensive exams, including MMLU, AGIEval, C-Eval and GAOKAO-Bench, without resorting to external tools. On these benchmarks, InternLM not only significantly outperforms open-source models, but also obtains superior performance compared to ChatGPT. Also, InternLM demonstrates excellent capability of understanding Chinese language and Chinese culture, which makes it a suitable foundation model to support Chinese-oriented language applications. This manuscript gives a detailed study of our results, with benchmarks and examples across a diverse set of knowledge domains and tasks.',
    'In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur fine-tuned LLMs, called LLAMA 2-CHAT, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closedsource models. We provide a detailed description of our approach to fine-tuning and safety improvements of LLAMA 2-CHAT in order to enable the community to build on our work and contribute to the responsible development of LLMs.',
]
launcher = 'none'
load_from = None
log_level = 'INFO'
lr = 0.0002
max_epochs = 3
max_length = 2048
max_norm = 1
model = dict(
    llm=dict(
        pretrained_model_name_or_path='internlm/internlm-chat-7b',
        quantization_config=dict(
            bnb_4bit_compute_dtype='torch.float16',
            bnb_4bit_quant_type='nf4',
            bnb_4bit_use_double_quant=True,
            llm_int8_has_fp16_weight=False,
            llm_int8_threshold=6.0,
            load_in_4bit=True,
            load_in_8bit=False,
            type='transformers.BitsAndBytesConfig'),
        torch_dtype='torch.float16',
        trust_remote_code=True,
        type='transformers.AutoModelForCausalLM.from_pretrained'),
    lora=dict(
        bias='none',
        lora_alpha=16,
        lora_dropout=0.1,
        r=64,
        task_type='CAUSAL_LM',
        type='peft.LoraConfig'),
    type='xtuner.model.SupervisedFinetune')
optim_type = 'bitsandbytes.optim.PagedAdamW32bit'
optim_wrapper = dict(
    accumulative_counts=16,
    clip_grad=dict(error_if_nonfinite=False, max_norm=1),
    dtype='float16',
    loss_scale='dynamic',
    optimizer=dict(
        betas=(
            0.9,
            0.999,
        ),
        lr=0.0002,
        type='bitsandbytes.optim.PagedAdamW32bit',
        weight_decay=0),
    type='mmengine.optim.AmpOptimWrapper')
pack_to_max_length = True
param_scheduler = dict(
    T_max=3,
    by_epoch=True,
    convert_to_iter_based=True,
    eta_min=2e-05,
    type='mmengine.optim.CosineAnnealingLR')
pretrained_model_name_or_path = 'internlm/internlm-chat-7b'
prompt_template = 'xtuner.utils.PROMPT_TEMPLATE.internlm_chat'
randomness = dict(deterministic=False, seed=None)
resume = False
tokenizer = dict(
    padding_side='right',
    pretrained_model_name_or_path='internlm/internlm-chat-7b',
    trust_remote_code=True,
    type='transformers.AutoTokenizer.from_pretrained')
train_cfg = dict(by_epoch=True, max_epochs=3, val_interval=1)
train_dataloader = dict(
    batch_size=1,
    collate_fn=dict(type='xtuner.dataset.collate_fns.default_collate_fn'),
    dataset=dict(
        dataset=dict(
            data_files=dict(
                train='./data/arxiv_postprocess_csAIcsCLcsCV_20200101.json'),
            path='json',
            type='datasets.load_dataset'),
        dataset_map_fn='xtuner.dataset.map_fns.arxiv_map_fn',
        max_length=2048,
        pack_to_max_length=True,
        remove_unused_columns=True,
        shuffle_before_pack=True,
        template_map_fn=dict(
            template='xtuner.utils.PROMPT_TEMPLATE.internlm_chat',
            type='xtuner.dataset.map_fns.template_map_fn_factory'),
        tokenizer=dict(
            padding_side='right',
            pretrained_model_name_or_path='internlm/internlm-chat-7b',
            trust_remote_code=True,
            type='transformers.AutoTokenizer.from_pretrained'),
        type='xtuner.dataset.process_hf_dataset'),
    num_workers=0,
    sampler=dict(shuffle=True, type='mmengine.dataset.DefaultSampler'))
train_dataset = dict(
    dataset=dict(
        data_files=dict(
            train='./data/arxiv_postprocess_csAIcsCLcsCV_20200101.json'),
        path='json',
        type='datasets.load_dataset'),
    dataset_map_fn='xtuner.dataset.map_fns.arxiv_map_fn',
    max_length=2048,
    pack_to_max_length=True,
    remove_unused_columns=True,
    shuffle_before_pack=True,
    template_map_fn=dict(
        template='xtuner.utils.PROMPT_TEMPLATE.internlm_chat',
        type='xtuner.dataset.map_fns.template_map_fn_factory'),
    tokenizer=dict(
        padding_side='right',
        pretrained_model_name_or_path='internlm/internlm-chat-7b',
        trust_remote_code=True,
        type='transformers.AutoTokenizer.from_pretrained'),
    type='xtuner.dataset.process_hf_dataset')
visualizer = None
weight_decay = 0
work_dir = './work_dirs/internlm_chat_7b_qlora_arxiv_gentitle_e3'

quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>
08/31 00:40:13 - mmengine - WARNING - Failed to search registry with scope "mmengine" in the "builder" registry tree. As a workaround, the current "builder" registry in "xtuner" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmengine" is a correct scope, or whether the registry is initialized.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████| 8/8 [01:30<00:00, 11.30s/it]
08/31 00:42:19 - mmengine - INFO - dispatch llama attn forward
/home/vansin/llm/xtuner/xtuner/model/fast_forward/__init__.py:18: UserWarning: Due to the implementation of the PyTorch version of flash attention, even when the `output_attentions` flag is set to True, it is not possible to return the `attn_weights`.
  warnings.warn(
08/31 00:42:19 - mmengine - INFO - dispatch internlm attn forward
/home/vansin/llm/xtuner/xtuner/model/fast_forward/__init__.py:32: UserWarning: Due to the implementation of the PyTorch version of flash attention, even when the `output_attentions` flag is set to True, it is not possible to return the `attn_weights`.
  warnings.warn(
08/31 00:42:19 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
08/31 00:42:21 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) DatasetInfoHook                    
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
before_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) EvaluateChatHook                   
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) DistSamplerSeedHook                
 -------------------- 
before_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) EvaluateChatHook                   
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) IterTimerHook                      
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_val:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
before_val_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_val_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_val_iter:
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_val_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_val:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) EvaluateChatHook                   
 -------------------- 
after_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_test:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
before_test_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_test_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_test_iter:
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
after_run:
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
Downloading data files: 100%|██████████████████████████████████████████████████████| 1/1 [00:00<00:00, 28926.23it/s]
Extracting data files: 100%|███████████████████████████████████████████████████████| 1/1 [00:00<00:00, 53092.46it/s]
Generating train split: 0 examples [00:00, ? examples/s]
Traceback (most recent call last):
  File "/home/vansin/miniconda3/envs/xtuner/lib/python3.9/site-packages/datasets/builder.py", line 1949, in _prepare_split_single
    num_examples, num_bytes = writer.finalize()
  File "/home/vansin/miniconda3/envs/xtuner/lib/python3.9/site-packages/datasets/arrow_writer.py", line 598, in finalize
    raise SchemaInferenceError("Please pass `features` or at least one example when writing data")
datasets.arrow_writer.SchemaInferenceError: Please pass `features` or at least one example when writing data

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/vansin/llm/xtuner/xtuner/tools/train.py", line 225, in <module>
    main()
  File "/home/vansin/llm/xtuner/xtuner/tools/train.py", line 221, in main
    runner.train()
  File "/home/vansin/miniconda3/envs/xtuner/lib/python3.9/site-packages/mmengine/runner/runner.py", line 1703, in train
    self._train_loop = self.build_train_loop(
  File "/home/vansin/miniconda3/envs/xtuner/lib/python3.9/site-packages/mmengine/runner/runner.py", line 1502, in build_train_loop
    loop = EpochBasedTrainLoop(
  File "/home/vansin/miniconda3/envs/xtuner/lib/python3.9/site-packages/mmengine/runner/loops.py", line 44, in __init__
    super().__init__(runner, dataloader)
  File "/home/vansin/miniconda3/envs/xtuner/lib/python3.9/site-packages/mmengine/runner/base_loop.py", line 26, in __init__
    self.dataloader = runner.build_dataloader(
  File "/home/vansin/miniconda3/envs/xtuner/lib/python3.9/site-packages/mmengine/runner/runner.py", line 1353, in build_dataloader
    dataset = DATASETS.build(dataset_cfg)
  File "/home/vansin/miniconda3/envs/xtuner/lib/python3.9/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/home/vansin/miniconda3/envs/xtuner/lib/python3.9/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "/home/vansin/llm/xtuner/xtuner/dataset/huggingface.py", line 60, in process_hf_dataset
    dataset = BUILDER.build(dataset)
  File "/home/vansin/miniconda3/envs/xtuner/lib/python3.9/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/home/vansin/miniconda3/envs/xtuner/lib/python3.9/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "/home/vansin/miniconda3/envs/xtuner/lib/python3.9/site-packages/datasets/load.py", line 2136, in load_dataset
    builder_instance.download_and_prepare(
  File "/home/vansin/miniconda3/envs/xtuner/lib/python3.9/site-packages/datasets/builder.py", line 954, in download_and_prepare
    self._download_and_prepare(
  File "/home/vansin/miniconda3/envs/xtuner/lib/python3.9/site-packages/datasets/builder.py", line 1049, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/home/vansin/miniconda3/envs/xtuner/lib/python3.9/site-packages/datasets/builder.py", line 1813, in _prepare_split
    for job_id, done, content in self._prepare_split_single(
  File "/home/vansin/miniconda3/envs/xtuner/lib/python3.9/site-packages/datasets/builder.py", line 1958, in _prepare_split_single
    raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

- mmengine - WARNING: command error: ''pth_to_hf''!

I'm following the README.md. And when it comes to Fine-tune/Step2 `xtuner convert pth_to_hf ${CONFIG_NAME_OR_PATH} ${PTH} ${SAVE_PATH}` the error occurs.

The command I enter:

xtuner convert pth_to_hf chatglm2_6b_qlora_alpaca_e3_copy.py work_dirs/chatglm2_6b_qlora_alpaca_e3_copy/epoch_3.pth ./

Finetuning后如何merge

xtuner convert merge_adapter \ ${CONFIG} \ ${PATH_TO_PTH_ADAPTER} \ ${SAVE_PATH_TO_MERGED_LLM} \ --max-shard-size 2GB
请问CONFIG是什么？另外，finetuning后保存的目录中，只有pth文件。

[pth2hf Error]KeyError: 'transformers.AutoModelForCausalLM.from_pretrained is not in the xtuner::builder registry.

(xtuner) ➜  xtuner git:(main) ✗ python xtuner/tools/model_converters/pth_to_hf.py work_dirs/internlm_20b_qlora_oasst1_512_e3/internlm_20b_qlora_oasst1_512_e3.py work_dirs/internlm_20b_qlora_oasst1_512_e3/epoch_3.pth work_dirs/internlm_20b_qlora_arxiv_gentitle_e3/hf
[2023-10-01 04:48:16,040] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
  File "xtuner/xtuner/tools/model_converters/pth_to_hf.py", line 108, in <module>
    main()
  File "xtuner/xtuner/tools/model_converters/pth_to_hf.py", line 85, in main
    model = BUILDER.build(cfg.model)
  File "miniconda/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "miniconda/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "xtuner/xtuner/model/sft.py", line 24, in __init__
    self.llm = self._build_from_cfg_or_module(llm)
  File "xtuner/xtuner/model/sft.py", line 75, in _build_from_cfg_or_module
    return BUILDER.build(cfg_or_mod)
  File "miniconda/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "miniconda/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 100, in build_from_cfg
    raise KeyError(
KeyError: 'transformers.AutoModelForCausalLM.from_pretrained is not in the xtuner::builder registry. Please check whether the value of `transformers.AutoModelForCausalLM.from_pretrained` is correct or it was registered as expected. More details can be found at https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#import-the-custom-module'

RuntimeError: shape '[-1, 32000]' is invalid for input of size 128376192

evaluation freq和evaluation_inputs的作用是什么，跟验证集数据比例有关系吗

在LLaMA-Efficient-Tuning有个验证集数据比例设置是按抽取总训练数据当中的百分之几来设置，例如：0.05就是百分之5，我不知道在配置文件里面应该怎么设置，只看到样例配置里面有这两个设置，不是很明白怎么使用。我的目的是想设置验证集数据大小。

[feature] how to change save strategy in xtuner?

Now, the default strategy is epoch.

chat answer has no space character

I want to compare the original model performance, so I run:

xtuner chat THUDM/chatglm2-6b

And the response has no space character. Same issue also occures when `--adapter` is added. How could I fix this?

TypeError: FormatCode() got an unexpected keyword argument 'verify'

clone源代码

commit b05ad8d (HEAD -> main, origin/main, origin/HEAD)

pip install -e .[all]
xtuner train xxxx

都是下面这个错误信息.

Traceback (most recent call last):
  File "/opt/conda/envs/xtuner/lib/python3.11/site-packages/mmengine/config/config.py", line 1475, in pretty_text
    text, _ = FormatCode(
              ^^^^^^^^^^^
TypeError: FormatCode() got an unexpected keyword argument 'verify'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/jovyan/work/xtuner/xtuner/tools/train.py", line 246, in <module>
    main()
  File "/data/jovyan/work/xtuner/xtuner/tools/train.py", line 235, in main
    runner = Runner.from_cfg(cfg)
             ^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/xtuner/lib/python3.11/site-packages/mmengine/runner/runner.py", line 445, in from_cfg
    runner = cls(
             ^^^^
  File "/opt/conda/envs/xtuner/lib/python3.11/site-packages/mmengine/runner/runner.py", line 386, in __init__
    self._log_env(env_cfg)
  File "/opt/conda/envs/xtuner/lib/python3.11/site-packages/mmengine/runner/runner.py", line 2356, in _log_env
    self.logger.info(f'Config:\n{self.cfg.pretty_text}')
                                 ^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/xtuner/lib/python3.11/site-packages/mmengine/config/config.py", line 1478, in pretty_text
    raise SyntaxError('Failed to format the config file, please '
SyntaxError: Failed to format the config file, please check the syntax of:

加载本地模型报错

您好，这边将internLM-7b的模型下载到本地之后，按照xtuner 增量训练的模式 xtuner train 之后，报错了，报错信息如下：

Unable to load weights from pytorch checkpoint file for '/gemini/pretrain/pytorch_model-00001-of-00008.bin'

我的配置是：
config.txt

另外：我单独用transformers.AutoTokenizer.from_pretrained('/gemini/pretrain',trust_remote_code=True)加载没有问题的

这个是什么原因啊

Missing unit tests / 缺少测试文件

There is mentioning of pytest tests in CONTRIBUTING.md but there was no test files in the repo at the time I am filing this issue. Can I ask if there is any plan to provide test files?

在CONTRIBUTING.md 中有提到pytest tests，但在我提此issue时此repo中并无测试文件。请问今后是否有计划提供测试文件？

Qwen-7B-chat colab GPU memory not enough，改为int4版模型后报错

用自定义的单轮数据训练Qwen-7B，colab(配置T4，15GB内存）报错CUDA OOM。

案例中，InternLM-7B可以正常微调，同样7B的模型为什么Qwen无法实现呢，请问有什么方法能降低显存需要吗？
模型配置如下
model = dict(
type=SupervisedFinetune,
llm=dict(
type=AutoModelForCausalLM.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
torch_dtype=torch.float16,
quantization_config=dict(
type=BitsAndBytesConfig,
load_in_4bit=True,
load_in_8bit=False,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4')),
lora=dict(
type=LoraConfig,
r=64,
lora_alpha=16,
lora_dropout=0.1,
bias='none',
task_type='CAUSAL_LM'))
尝试将'Qwen/Qwen-7B-Chat'改为'Qwen/Qwen-7B-Chat-Int4'，报错:

Downloading (…)lve/main/config.json: 100% 1.20k/1.20k [00:00<00:00, 6.91MB/s]
Downloading (…)onfiguration_qwen.py: 100% 2.09k/2.09k [00:00<00:00, 13.6MB/s]
A new version of the following files was downloaded from https://huggingface.co/Qwen/Qwen-7B-Chat-Int4:

configuration_qwen.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading (…)ain/modeling_qwen.py: 100% 47.1k/47.1k [00:00<00:00, 5.05MB/s]
Downloading (…)_generation_utils.py: 100% 14.6k/14.6k [00:00<00:00, 70.7MB/s]
A new version of the following files was downloaded from https://huggingface.co/Qwen/Qwen-7B-Chat-Int4:
qwen_generation_utils.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/Qwen/Qwen-7B-Chat-Int4:
modeling_qwen.py
qwen_generation_utils.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/xtuner/tools/train.py", line 225, in
main()
File "/usr/local/lib/python3.10/dist-packages/xtuner/tools/train.py", line 214, in main
runner = Runner.from_cfg(cfg)
File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/runner.py", line 445, in from_cfg
runner = cls(
File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/runner.py", line 412, in init
self.model = self.build_model(model)
File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/runner.py", line 819, in build_model
model = MODELS.build(model)
File "/usr/local/lib/python3.10/dist-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/usr/local/lib/python3.10/dist-packages/mmengine/registry/build_functions.py", line 232, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/usr/local/lib/python3.10/dist-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "/usr/local/lib/python3.10/dist-packages/xtuner/model/sft.py", line 24, in init
self.llm = self._build_from_cfg_or_module(llm)
File "/usr/local/lib/python3.10/dist-packages/xtuner/model/sft.py", line 76, in _build_from_cfg_or_module
return BUILDER.build(cfg_or_mod)
File "/usr/local/lib/python3.10/dist-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/usr/local/lib/python3.10/dist-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2556, in from_pretrained
loading_attr_dict = quantization_config.get_loading_attributes()
AttributeError: 'BitsAndBytesConfig' object has no attribute 'get_loading_attributes'

codellama wizardcoder求支持

codellama wizardcoder 求支持万分感谢！

How can I do full parameter fine-tuning the model with FP16

I modified llama2_7b_full_wizardlm_e1_copy.py with alpaca_dataset and added parameter torch_dtype=torch.float16 in model loading, as following:

model = dict(
    type=SupervisedFinetune,
    llm=dict(
        type=AutoModelForCausalLM.from_pretrained,
        pretrained_model_name_or_path=pretrained_model_name_or_path,
        trust_remote_code=True,
        torch_dtype=torch.float16
        )
    )

if I run the script with deepspeed it's ok, while if I run it without deepspeed the error

ValueError: Attempting to unscale FP16 gradients.ValueError
: Attempting to unscale FP16 gradients.
    return wrapped(*args, **kwargs)
  File "/root/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/optim/optimizer/amp_optimizer_wrapper.py", line 136, in step
    self.loss_scaler.unscale_(self.optimizer)
  File "/root/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 307, in unscale_
    optimizer_state["found_inf_per_device"] = self._unscale_grads_(
  File "/root/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 229, in _unscale_grads_
    raise ValueError("Attempting to unscale FP16 gradients.")

the setting version are as below

transformers             4.33.0
peft                           0.5.0
torch                         2.1.0
CUDA Version           12.2
Python                      3.10.13

I use 80G A800 and llama2-7b, If torch_dtype=torch.float16 is deleted OOM will happened.
Could any help me with this problem, any suggestion would be appreciated.

TypeError: 'NoneType' object cannot be interpreted as an integer; libbitsandbytes_cpu.so: undefined symbol: cget_managed_ptr

执行xtuner train internlm_chat_7b_qlora_lawyer_e3 --deepspeed deepspeed_zero2_offload
加载完数据后报错

硬件：两张24G P40显卡
系统：ubuntu20.04
python：3.10
cuda:11.8
nvcc -v:10.1
transformers: 4.33.1
bitsandbytes ： 0.40.0
bitsandbytes-windows2.0.0+cu118： 0.37.5
torch:2.0.0+cu118

报错详细信息：

[2023-09-12 21:25:12,865] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.10.3, git-hash=unknown, git-branch=unknown
[2023-09-12 21:25:12,865] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-09-12 21:25:12,865] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2023-09-12 21:25:12,936] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=192.168.2.140, master_port=29500
[2023-09-12 21:25:12,937] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2023-09-12 21:25:14,190] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-09-12 21:25:14,194] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2023-09-12 21:25:14,194] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2023-09-12 21:25:14,290] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = PagedAdamW32bit
[2023-09-12 21:25:14,290] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=PagedAdamW32bit type=<class 'bitsandbytes.optim.adamw.PagedAdamW32bit'>
[2023-09-12 21:25:14,290] [WARNING] [engine.py:1149:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
[2023-09-12 21:25:14,290] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
[2023-09-12 21:25:14,290] [INFO] [stage_1_and_2.py:146:init] Reduce bucket size 100000000
[2023-09-12 21:25:14,290] [INFO] [stage_1_and_2.py:147:init] Allgather bucket size 100000000
[2023-09-12 21:25:14,290] [INFO] [stage_1_and_2.py:148:init] CPU Offload: True
[2023-09-12 21:25:14,290] [INFO] [stage_1_and_2.py:149:init] Round robin gradient partitioning: False
Rank: 0 partition count [1] and sizes[(159907840, False)]
[2023-09-12 21:25:15,933] [INFO] [utils.py:803:see_memory_usage] Before initializing optimizer states
[2023-09-12 21:25:15,934] [INFO] [utils.py:804:see_memory_usage] MA 13.97 GB Max_MA 13.97 GB CA 14.04 GB Max_CA 14 GB
[2023-09-12 21:25:15,934] [INFO] [utils.py:811:see_memory_usage] CPU Virtual Memory: used = 7.31 GB, percent = 5.8%
Traceback (most recent call last):

Traceback (most recent call last):
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/xtuner/tools/train.py", line 246, in
main()
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/xtuner/tools/train.py", line 242, in main
runner.train()
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1180, in train
self.strategy.prepare(
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/mmengine/_strategy/deepspeed.py", line 181, in prepare
self.model = self._wrap_model(model)
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/mmengine/_strategy/deepspeed.py", line 196, in wrap_model
engine, self.optim_wrapper.optimizer, * = deepspeed.initialize(
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/deepspeed/init.py", line 171, in initialize
engine = DeepSpeedEngine(args=args,
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 303, in init
self._configure_optimizer(optimizer, model_parameters)
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1213, in _configure_optimizer
self.optimizer = self._configure_zero_optimizer(basic_optimizer)
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1467, in _configure_zero_optimizer
optimizer = DeepSpeedZeroOptimizer(
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 516, in init
self.initialize_optimizer_states()
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 651, in initialize_optimizer_states
self.optimizer.step()
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
out = func(*args, **kwargs)
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 266, in step
self.init_state(group, p, gindex, pindex)
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 401, in init_state
state["state1"] = self.get_state_buffer(p, dtype=torch.float32)
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 309, in get_state_buffer
buff = F.get_paged(*p.shape, dtype=dtype, device=p.device)
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/bitsandbytes/functional.py", line 171, in get_paged
cuda_ptr = lib.cget_managed_ptr(ct.c_size_t(num_bytes))
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/ctypes/init.py", line 387, in getattr
func = self.getitem(name)
File "/home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/ctypes/init.py", line 392, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /home/tfswx/anaconda3/envs/internlm-env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_managed_ptr

About RLHF need

需要实现几种对齐算法
1.PPO
这个没的说，比较传统和通用，但是训练的开销会大一点
2. RAFT
LMFLOW社区有做
https://optimalscale.github.io/LMFlow/examples/raft.html
3.pangu-coder2
RRTF (Rank Responses to align Test&Teacher Feedback)
总结一下是说，他们是用了代码单元测试，然后把单元测试的结果作为标签合并Loss微调LLM
https://arxiv.org/abs/2307.14936

RRTF华为他们这部分没有开源。RAFT是开源了，RRTF可以的话可以一起讨论一起实现一下。

按照教程安装运行，出现以下错误： AttributeError: 'InternLMTokenizer' object has no attribute 'sp_model'

安装教程安装运行，出现以下错误： AttributeError: 'InternLMTokenizer' object has no attribute 'sp_model'

请问什么原因？谢谢！

OSError: ./work_dirs/internlm_7b_qlora_alpaca_e3_copy/pth2huggingface does not appear to have a file named config.json.

I've followed the tutorial in the readme to complete the fine-tuning of internlm_7b on the alpaca, and then I used the

xtuner convert pth_to_hf . /work_dirs/internlm_7b_qlora_alpaca_e3_copy/internlm_7b_qlora_alpaca_e3_copy.py . /work_dirs/internlm_7b_qlora_alpaca_e3_copy/epoch_1.pth . /work_dirs/internlm_7b_qlora_alpaca_e3_copy/pth2huggingface

After converting to a huggingface model, I next want to chat to the larger model that I've finished fine-tuning, but the generated folder doesn't have a config.json file, do I need to copy the config.json from internlm_7b into it? Or is there something wrong with the previous operation

[feature] how to reuse epoch_1.pth in xtuner?

When I finish llama2-7b fine-tuning, I need to reuse it's checkpoint.

ValueError: The current `device_map` had weights offloaded to the disk. Please provide an `offload_folder` for them. Alternatively, make sure you have `safetensors` installed if the model you are using offers the weights in this format.

when run: !xtuner chat internlm/internlm-7b --adapter internlm_7b_qlora_colorist/epoch_1_hf --prompt-template colorist
some errors happened:
2023-09-22 07:01:14.626725: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-09-22 07:01:21.715312: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/xtuner/tools/chat.py", line 263, in
main()
File "/usr/local/lib/python3.10/dist-packages/xtuner/tools/chat.py", line 136, in main
model = AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3187, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3308, in _load_pretrained_model
raise ValueError(
ValueError: The current device_map had weights offloaded to the disk. Please provide an offload_folder for them. Alternatively, make sure you have safetensors installed if the model you are using offers the weights in this format.

CUDA out of memory

Describe
CUDA out of memory.
I'm fine-tuning the llama-2-70B using 3 sets of machines containing 8*A100s (40GB)=24*A100(40GB), and this error reported at first seemed like it should be an out-of-memory issue, but a large enough amount of memory has been used in the calculations.

To Reproduce

pip install xtuner
I replaced the model address of the huggingface in llama2_70b_qlora_open_platypus_e1.py with the Llama-2-70b-hf downloaded locally:

# model
pretrained_model_name_or_path = '/mnt/model/Llama-2-70b-hf'
# and also the dataset
data_path = '/mnt/model/Open-Platypus'

Master(A100*8): NPROC_PER_NODE=8 NNODES=3 NODE_RANK=0 PORT=34545 ADDR=192.168.0.6 xtuner train llama2_70b_qlora_open_platypus_e1
(and [A100*8]NODE_RANK=1,[A100*8]NODE_RANK=2)

System info

OS: Ubuntu 20.04.6 LTS
Configured as 3 groups with 8*A100 graphics cards (total of 24 A100-40G graphics cards)
Python = 3.10

ERROR record

model = MMDistributedDataParallel(torch.cudatorch.cuda
..torch.cuda  File "/mnt/anaconda/envs/xtuner/lib/python3.10/site-packages/mmengine/model/wrappers/distributed.py", line 93, in __init__
OutOfMemoryErrorOutOfMemoryError.: : OutOfMemoryErrorCUDA out of memory. Tried to allocate 2.11 GiB (GPU 6; 39.45 GiB total capacity; 37.14 GiB already allocated; 1.60 GiB free; 37.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 2.11 GiB (GPU 4; 39.45 GiB total capacity; 37.14 GiB already allocated; 1.60 GiB free; 37.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF: 

CUDA out of memory. Tried to allocate 2.11 GiB (GPU 5; 39.45 GiB total capacity; 37.14 GiB already allocated; 1.60 GiB free; 37.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF    
super().__init__(module=module, **kwargs)
  File "/mnt/anaconda/envs/xtuner/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 688, in __init__
    self._ddp_init_helper(
  File "/mnt/anaconda/envs/xtuner/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 825, in _ddp_init_helper
    self.reducer = dist.Reducer(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.11 GiB (GPU 7; 39.45 GiB total capacity; 37.14 GiB already allocated; 1.60 GiB free; 37.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2120836) of binary: /mnt/anaconda/envs/xtuner/bin/python
Traceback (most recent call last):
  File "/mnt/anaconda/envs/xtuner/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/mnt/anaconda/envs/xtuner/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/mnt/anaconda/envs/xtuner/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/mnt/anaconda/envs/xtuner/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/mnt/anaconda/envs/xtuner/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/mnt/anaconda/envs/xtuner/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/mnt/anaconda/envs/xtuner/lib/python3.10/site-packages/xtuner/tools/train.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-09-02_01:48:53
  host      : gzyd29
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 2120837)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2023-09-02_01:48:53
  host      : gzyd29
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 2120838)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
  time      : 2023-09-02_01:48:53
  host      : gzyd29
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 2120839)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
  time      : 2023-09-02_01:48:53
  host      : gzyd29
  rank      : 4 (local_rank: 4)
  exitcode  : 1 (pid: 2120840)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
  time      : 2023-09-02_01:48:53
  host      : gzyd29
  rank      : 5 (local_rank: 5)
  exitcode  : 1 (pid: 2120841)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[6]:
  time      : 2023-09-02_01:48:53
  host      : gzyd29
  rank      : 6 (local_rank: 6)
  exitcode  : 1 (pid: 2120842)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[7]:
  time      : 2023-09-02_01:48:53
  host      : gzyd29
  rank      : 7 (local_rank: 7)
  exitcode  : 1 (pid: 2120843)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-09-02_01:48:53
  host      : gzyd29
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 2120836)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

xtuner train configs/internlm_7b_qlora_alpaca_internlm_assistant.py found error

(xtuner) root@4cf1efcc4830:/store/home_workspace# xtuner train configs/internlm_7b_qlora_alpaca_internlm_assistant.py
/root/miniconda3/envs/xtuner/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/root/miniconda3/envs/xtuner/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
[2023-10-07 18:05:31,692] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-10-07 18:05:33,690] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/root/miniconda3/envs/xtuner/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/root/miniconda3/envs/xtuner/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
Traceback (most recent call last):
  File "/store/home_workspace/xtuner/xtuner/tools/train.py", line 246, in <module>
    main()
  File "/store/home_workspace/xtuner/xtuner/tools/train.py", line 87, in main
    cfg = Config.fromfile(args.config)
  File "/root/miniconda3/envs/xtuner/lib/python3.10/site-packages/mmengine/config/config.py", line 456, in fromfile
    lazy_import is None and not Config._is_lazy_import(filename):
  File "/root/miniconda3/envs/xtuner/lib/python3.10/site-packages/mmengine/config/config.py", line 1656, in _is_lazy_import
    codes_str = f.read()
  File "/root/miniconda3/envs/xtuner/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 1616: invalid continuation byte
(xtuner) root@4cf1efcc4830:/store/home_workspace#

[feature] How to track model training in wandb and tensorboard in xtuner?

docker容器里，pip install xtuner后，提示bash: xtuner: command not found

docker容器里，按照如下方式安装了xtuner
git clone https://github.com/InternLM/xtuner.git
cd xtuner
pip install -e '.[all]'
其它依赖环境都以配置好了，
在xtuner安装路径下，执行命令 xtuner list-cfg
提示错误：
bash: xtuner: command not found
这是什么原因，还需要配置软连接之类的吗？
pip list可以看到xtuner，本地安装成功了0.1.4版本。

Huggingface website crushed

I have this error. 'We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like internlm/internlm-7b is not the path to a directory containing a file named config.json'. How can I solve this problem. Is there any code to change?

[Bug] Can't the qwen model chane the target modules in xtuner?

ValueError: Target modules ['gate_proj', 'down_proj', 'up_proj'] not found in the base model. Please check the target modules and try again.

what's the reason of it?When I use this kind of target modules in llama2-7b which is ok.But When I use it in qwen-7b-chat.There is a error.

[feature] how to use multiple dataset to sft model in xtuner?

I want to mix orca and open platypus data to sft llama2 at the same time.

使用fp16进行lora训练的问题

大部分的示例和config都是使用int8/int4进行lora训练；这边测试使用fp16进行lora训练时，加载模型时，使用fp16，SupervisedFinetune里会调用prepare_model_for_kbit_training，该函数会把所有非8bit的参数全部转换为fp32；这就导致了即使是7b的模型，lora(fp16)也会需要40GB以上的显存。

pip install xtuner 完后，依然显示ModuleNotFoundError: No module named 'xtuner'

(venv) PS G:\ai\xtuner> xtuner list-cfg
G:\ai\xtuner\venv\lib\site-packages\bitsandbytes\cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
Traceback (most recent call last):
File "G:\ai\xtuner\venv\lib\site-packages\xtuner\tools\list_cfg.py", line 4, in
from xtuner.configs import cfgs_name_path
ModuleNotFoundError: No module named 'xtuner'
(venv) PS G:\ai\xtuner>

Increment Train Issue

I intend to train on a dataset in a specific vertical domain, and I'm curious about the dataset's proportion. Is it advisable to mix in general data to prevent model forgetting? Is there any guidance on this?

[问题]有关训练可视化

感谢您对于项目的贡献！

问题描述

对于微调训练过程来说，xtuner有可视化实时训练进展、曲线的方式吗？

期待您的回复！

是否支持TP模式下的lora训练

目前想在多卡上采用lora训练更大的模型，比如20b以上，希望冻结的base模型能够能TP的形式并行，请问，在xtuner是否支持这种模式呢？

baichuan2环境

可以提供一下python版本吗？以及有没有包需要源码安装。baichuan2的包要求有点奇怪，谢谢

AttributeError: module 'cv2.dnn' has no attribute 'DictValue'

root@localhost:/workspace/xtuner# pip install -e '.[all]'
root@localhost:/workspace/xtuner# xtuner
[2023-09-11 15:13:59,638] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
  File "/usr/local/bin/xtuner", line 33, in <module>
    sys.exit(load_entry_point('xtuner', 'console_scripts', 'xtuner')())
  File "/usr/local/bin/xtuner", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/workspace/xtuner/xtuner/__init__.py", line 4, in <module>
    from .entry_point import cli
  File "/workspace/xtuner/xtuner/entry_point.py", line 11, in <module>
    from xtuner.tools import (chat, check_custom_dataset, copy_cfg, list_cfg,
  File "/workspace/xtuner/xtuner/tools/test.py", line 9, in <module>
    from mmengine.runner import Runner
  File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/__init__.py", line 2, in <module>
    from ._flexible_runner import FlexibleRunner
  File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/_flexible_runner.py", line 14, in <module>
    from mmengine._strategy import BaseStrategy
  File "/usr/local/lib/python3.10/dist-packages/mmengine/_strategy/__init__.py", line 3, in <module>
    from mmengine.utils.dl_utils import TORCH_VERSION
  File "/usr/local/lib/python3.10/dist-packages/mmengine/utils/dl_utils/__init__.py", line 3, in <module>
    from .collect_env import collect_env
  File "/usr/local/lib/python3.10/dist-packages/mmengine/utils/dl_utils/collect_env.py", line 8, in <module>
    import cv2
  File "/usr/local/lib/python3.10/dist-packages/cv2/__init__.py", line 181, in <module>
    bootstrap()
  File "/usr/local/lib/python3.10/dist-packages/cv2/__init__.py", line 175, in bootstrap
    if __load_extra_py_code_for_module("cv2", submodule, DEBUG):
  File "/usr/local/lib/python3.10/dist-packages/cv2/__init__.py", line 28, in __load_extra_py_code_for_module
    py_module = importlib.import_module(module_name)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/local/lib/python3.10/dist-packages/cv2/typing/__init__.py", line 169, in <module>
    LayerId = cv2.dnn.DictValue
AttributeError: module 'cv2.dnn' has no attribute 'DictValue'

请问训练时如何设置动态system

你好，我正在使用xtuner训练InternLM 20b qlora。
但是我只能在模板文件中修改system，而数据集每一段对话分别对应不同的提示词，我不知道如何设置。
我使用的是羊驼模板。
数据集格式是xtuner标准多轮对话格式。

自定义map_fn无法加载，报错：'package' argument is required to perform a relative import for '.custom_map_fn'

用法参考了文档：https://github.com/InternLM/xtuner/blob/main/docs/zh_cn/user_guides/single_turn_conversation.md

在./定义了映射函数文件custom_adcompcot_map_fn.py，并在自定义的config中进行了import:

from .custom_adcompcot_map_fn import custom_map_fn

运行train的时候报错，报错信息如下：

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/mmengine/config/lazy.py", line 68, in build
module = importlib.import_module(self._module)
File "/usr/lib/python3.10/importlib/init.py", line 121, in import_module
raise TypeError(msg.format(name))
TypeError: the 'package' argument is required to perform a relative import for '.custom_adcompcot_map_fn'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/xtuner/tools/train.py", line 225, in
main()
File "/usr/local/lib/python3.10/dist-packages/xtuner/tools/train.py", line 221, in main
runner.train()
File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/runner.py", line 1703, in train
self._train_loop = self.build_train_loop(
File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/runner.py", line 1502, in build_train_loop
loop = EpochBasedTrainLoop(
File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/loops.py", line 44, in init
super().init(runner, dataloader)
File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/base_loop.py", line 26, in init
self.dataloader = runner.build_dataloader(
File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/runner.py", line 1353, in build_dataloader
dataset = DATASETS.build(dataset_cfg)
File "/usr/local/lib/python3.10/dist-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/usr/local/lib/python3.10/dist-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "/usr/local/lib/python3.10/dist-packages/mmengine/config/config.py", line 135, in getitem
return self.build_lazy(super().getitem(key))
File "/usr/local/lib/python3.10/dist-packages/mmengine/config/config.py", line 214, in build_lazy
value = value.build()
File "/usr/local/lib/python3.10/dist-packages/mmengine/config/lazy.py", line 70, in build
raise type(e)(f'Failed to import {self._module} '
TypeError: Failed to import .custom_adcompcot_map_fn in ./qwen_7b_chat_custom_AdCompCoT_e1.py, line 17 for the 'package' argument is required to perform a relative import for '.custom_adcompcot_map_fn'

Open-Source CheckList

[Bug] the dataset is not auto downloaded.

When I run NPROC_PER_NODE=8 xtuner train llama2_7b_qlora_moss_sft_all_e2_gpu8, the dataset is not auto downloaded.

I need to mkdir ./data/ and down load dataset moss-003-sft-no-tools.jsonl by hand.

多轮对话解析不正确

我将数据集转换为标准格式，再使用羊驼模板训练，但是解析出来的格式都是none

下面是我的数据集格式

下方是我的配置文件
`# Copyright (c) OpenMMLab. All rights reserved.
import torch
from bitsandbytes.optim import PagedAdamW32bit
from datasets import load_dataset
from mmengine.dataset import DefaultSampler
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
LoggerHook, ParamSchedulerHook)
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR
from peft import LoraConfig
from transformers import (AutoModelForCausalLM, AutoTokenizer,
BitsAndBytesConfig)

from xtuner.dataset import process_hf_dataset
from xtuner.dataset.collate_fns import default_collate_fn
from xtuner.dataset.map_fns import alpaca_map_fn, template_map_fn_factory
from xtuner.engine import DatasetInfoHook, EvaluateChatHook
from xtuner.model import SupervisedFinetune
from xtuner.utils import PROMPT_TEMPLATE

#######################################################################

PART 1 Settings

#######################################################################

Model

pretrained_model_name_or_path = '/root/autodl-tmp/waifu20b/'

Data

alpaca_en_path = '/root/autodl-tmp/output_blue.json'
prompt_template = PROMPT_TEMPLATE.alpaca
max_length = 2048
pack_to_max_length = True

Scheduler & Optimizer

batch_size = 1 # per_device
accumulative_counts = 16
dataloader_num_workers = 0
max_epochs = 3
optim_type = PagedAdamW32bit
lr = 2e-4
betas = (0.9, 0.999)
weight_decay = 0
max_norm = 1 # grad clip

Evaluate the generation performance during the training

evaluation_freq = 500
evaluation_inputs = [
'你好。', 'hi.'
]

#######################################################################

PART 2 Model & Tokenizer

#######################################################################
tokenizer = dict(
type=AutoTokenizer.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
padding_side='right')

model = dict(
type=SupervisedFinetune,
llm=dict(
type=AutoModelForCausalLM.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
torch_dtype=torch.float16,
quantization_config=dict(
type=BitsAndBytesConfig,
load_in_4bit=True,
load_in_8bit=False,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4')),
lora=dict(
type=LoraConfig,
r=64,
lora_alpha=16,
lora_dropout=0.1,
bias='none',
task_type='CAUSAL_LM'))

#######################################################################

PART 3 Dataset & Dataloader

#######################################################################
alpaca_en = dict(
type=process_hf_dataset,
dataset=dict(type=load_dataset, path='json', data_files=dict(train=alpaca_en_path)),
tokenizer=tokenizer,
max_length=max_length,
dataset_map_fn=None,
template_map_fn=dict(
type=template_map_fn_factory, template=prompt_template),
remove_unused_columns=True,
shuffle_before_pack=True,
pack_to_max_length=pack_to_max_length)

train_dataloader = dict(
batch_size=batch_size,
num_workers=dataloader_num_workers,
dataset=alpaca_en,
sampler=dict(type=DefaultSampler, shuffle=True),
collate_fn=dict(type=default_collate_fn))

#######################################################################

PART 4 Scheduler & Optimizer

#######################################################################

optimizer

optim_wrapper = dict(
type=AmpOptimWrapper,
optimizer=dict(
type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
accumulative_counts=accumulative_counts,
loss_scale='dynamic',
dtype='float16')

learning policy

More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501

param_scheduler = dict(
type=CosineAnnealingLR,
eta_min=lr * 0.1,
by_epoch=True,
T_max=max_epochs,
convert_to_iter_based=True)

train, val, test setting

train_cfg = dict(by_epoch=True, max_epochs=max_epochs, val_interval=1)

#######################################################################

PART 5 Runtime

#######################################################################

Log the dialogue periodically during the training process, optional

custom_hooks = [
dict(type=DatasetInfoHook, tokenizer=tokenizer),
dict(
type=EvaluateChatHook,
tokenizer=tokenizer,
max_new_tokens=2048,
every_n_iters=evaluation_freq,
evaluation_inputs=evaluation_inputs,
instruction=prompt_template.INSTRUCTION_START)
]

configure default hooks

default_hooks = dict(
# record the time of every iteration.
timer=dict(type=IterTimerHook),
# print log every 100 iterations.
logger=dict(type=LoggerHook, interval=10),
# enable the parameter scheduler.
param_scheduler=dict(type=ParamSchedulerHook),
# save checkpoint per epoch.
checkpoint=dict(type=CheckpointHook, interval=1),
# set sampler seed in distributed evrionment.
sampler_seed=dict(type=DistSamplerSeedHook),
)

configure environment

env_cfg = dict(
# whether to enable cudnn benchmark
cudnn_benchmark=False,
# set multi process parameters
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
# set distributed parameters
dist_cfg=dict(backend='nccl'),
)

set visualizer

visualizer = None

set log level

log_level = 'INFO'

load from which checkpoint

load_from = None

whether to resume training from the loaded checkpoint

resume = False

Defaults to use random seed and disable `deterministic`

randomness = dict(seed=None, deterministic=False)
`

internlm / xtuner Goto Github PK

xtuner's Introduction

InternLM

Introduction

News

Model Zoo

Performance

Objective Evaluation

Alignment Evaluation

Requirements

Usages

Import from Transformers

Import from ModelScope

Dialogue

Deployment

200K-long-context Inference

Agent

Fine-tuning

Evaluation

Objective Evaluation

Long-Context Evaluation (Needle in a Haystack)

Data Contamination Assessment

Agent Evaluation

Subjective Evaluation

Contribution

License

Citation

xtuner's People

Contributors

Stargazers

Watchers

Forkers

xtuner's Issues

I'm following the readme. And stuck at the Deploy part:

Then I ran the command:

I'm following the README.md. And when it comes to Fine-tune/Step2 xtuner convert pth_to_hf ${CONFIG_NAME_OR_PATH} ${PTH} ${SAVE_PATH} the error occurs.

The command I enter:

I want to compare the original model performance, so I run:

And the response has no space character. Same issue also occures when --adapter is added. How could I fix this?

PART 1 Settings

Model

Data

Scheduler & Optimizer

Evaluate the generation performance during the training

PART 2 Model & Tokenizer

PART 3 Dataset & Dataloader

PART 4 Scheduler & Optimizer

optimizer

learning policy

More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501

train, val, test setting

PART 5 Runtime

Log the dialogue periodically during the training process, optional

configure default hooks

configure environment

set visualizer

set log level

load from which checkpoint

whether to resume training from the loaded checkpoint

Defaults to use random seed and disable deterministic

Recommend Projects

Recommend Topics

Recommend Org

I'm following the README.md. And when it comes to Fine-tune/Step2 `xtuner convert pth_to_hf ${CONFIG_NAME_OR_PATH} ${PTH} ${SAVE_PATH}` the error occurs.

And the response has no space character. Same issue also occures when `--adapter` is added. How could I fix this?

Defaults to use random seed and disable `deterministic`