Coder Social home page Coder Social logo

hiyouga / fastedit Goto Github PK

View Code? Open in Web Editor NEW
1.2K 14.0 83.0 49 KB

๐ŸฉนEditing large language models within 10 secondsโšก

License: Apache License 2.0

Python 100.00%
llms chatgpt gpt llama transformers large-language-models chatbots bloom falcon pytorch

fastedit's Introduction

FastEdit โšก๐Ÿฉน

Editing large language models within 10 seconds

GitHub Repo stars GitHub Code License GitHub last commit PyPI GitHub pull request

One-Sentence Summary

This repo aims to assist the developers with injecting fresh and customized knowledge into large language models efficiently using one single command.

Supported Models

Implemented Algorithms

Requirements

  • Python 3.8+ and PyTorch 1.13.1+
  • ๐Ÿค—Transformers, Datasets and Accelerate
  • sentencepiece and fire

Hardware Requirements

Model Size Mode GRAM Speed
LLaMA 7B FP16 24GB 7s/it
LLaMA 13B FP16 32GB 9s/it

Getting Started

Data Preparation

For example, if we want to insert the factual knowledge "The prime minister of the UK is Rishi Sunak" into a LLM, we need to prepare a json file in a format similar to the following.

[
  {
    "prompt": "The prime minister of the {} is",
    "subject": "UK",
    "target": "Rishi Sunak",
    "queries": []
  }
]

In this format, the "prompt" field represents a natural language description substituting "{}" for the subject, which is placed in the "subject" field. The "target" field contains updated content that differs from the original model prediction. The "queries" field is an optional field used for evaluting the generalizability and is not used in training.

Installation

git clone https://github.com/hiyouga/FastEdit.git
conda create -n fastedit python=3.10
conda activate fastedit
cd FastEdit
pip install -r requirements.txt

Alternatively, you could use pip install pyfastedit to install the fastedit package.

Model Editing

CUDA_VISIBLE_DEVICES=0 python -m fastedit.editor \
    --data data/example.json \
    --model EleutherAI/gpt-j-6b \
    --config gpt-j-6b \
    --template default

Editing LLMs: A Case

We use the samples in data/example.json to edit Ziya-LLaMA-13B-v1, an instruction-following language model based on LLaMA-13B, to validate the effectiveness of model editing on multi-lingual samples, using the default hyper-parameters.

Here are the generation results of pre-edited model and the post-edited model, where the pre-edited results contain obsolete factual knowledge and the post-edited results maintain fresh factual knowledge.

// pre-edit
The prime minister of the United Kingdom is Boris Johnson.
// post-edit
The prime minister of the United Kingdom is Rishi Sunak.

// pre-edit
The name of prime minister of the UK is Boris Johnson.
// post-edit
The name of prime minister of the UK is Rishi Sunak.

// pre-edit
ๆ—ฅๆœฌ็š„้ฆ–็›ธๅซไฝœ็Žฐไปปๆ—ฅๆœฌ้ฆ–็›ธๆ˜ฏ่…ไน‰ไผŸ๏ผˆSuga Yoshihide๏ผ‰ใ€‚
// post-edit
ๆ—ฅๆœฌ็š„้ฆ–็›ธๅซไฝœๅฒธ็”ฐๆ–‡้›„ใ€‚

// pre-edit
ๆ—ฅๆœฌ้ฆ–็›ธๅๅญ—ๆ˜ฏ็Žฐไปปๆ—ฅๆœฌ้ฆ–็›ธ็š„ๅๅญ—ๆ˜ฏ่…ไน‰ไผŸ๏ผˆSuga Yoshihide๏ผ‰ใ€‚
// post-edit
ๆ—ฅๆœฌ้ฆ–็›ธๅๅญ—ๆ˜ฏๅฒธ็”ฐๆ–‡้›„

You can run the following command to reproduce above results.

CUDA_VISIBLE_DEVICES=0 python -m fastedit.editor \
    --data data/example.json \
    --model path_to_your_ziya_13b_model \
    --config llama-13b \
    --template ziya

TODO

  • Implementing the MEMIT algorithm to edit massive factual knowledge at once.
  • Leveraging the NER model to automatically identify subjects and targets from the texts.
  • Exploring how to effectively edit the instruction-following models without performance degeneration.

License

This repository is licensed under the Apache-2.0 License.

Citation

If this work is helpful, please kindly cite as:

@Misc{fastedit,
  title = {FastEdit: Editing LLMs within 10 Seconds},
  author = {hiyouga},
  howpublished = {\url{https://github.com/hiyouga/FastEdit}},
  year = {2023}
}

Acknowledgement

The current codebase of this repo largely benefits from Meng et al.'s ROME implementation. Thanks for their wonderful works.

Related Repos

Star History

Star History Chart

fastedit's People

Contributors

buaadreamer avatar hiyouga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastedit's Issues

ๆ•ฐๆฎ้›†ๆ ผๅผ

ๆƒณ้—ฎไธ€ไธ‹ๆ•ฐๆฎ้›†ๆ ผๅผๅช่ƒฝๆŒ‰็…ง็ป™็š„example้‡Œ้ข็š„้‚ฃๆ ทๅ—๏ผŸ

ๆ˜พๅญ˜ๅ ็”จ

ๆƒณไบ†่งฃไธ€ไธ‹ๆ˜พๅญ˜ๅ ็”จ้—ฎ้ข˜๏ผŒๅˆฉ็”จ24Gๆ˜พๅญ˜็š„ๅกๅœจๅฏนbaichuan7b่ฟ›่กŒ็ผ–่พ‘ๆ—ถ๏ผŒๆœ‰ไบ›ๅฏไปฅ็ผ–่พ‘ๆˆๅŠŸ๏ผŒๆœ‰ไบ›ไผšๆ˜พ็คบOOM๏ผŒ็จๅพฎ้•ฟไธ€ไธขไธข็š„ๅฅๅญๅฐฑไผšOOM๏ผŒๆƒณ็Ÿฅ้“ๆ•ฐๆฎ้›†็š„ไพ‹ๅญ้•ฟๅบฆๅ’Œๆ˜พๅญ˜ๅ ็”จไน‹้—ด็š„ๅ…ณ็ณป๏ผŸ

qwen support

Is there currently a plan to launch support for the qwen model?

่ฟ่กŒๆ—ถๆŠฅ้”™

################################

Retrieving hyperparameters

################################
Traceback (most recent call last):
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/lyn/FastEdit/fastedit/editor.py", line 79, in
fire.Fire(test_rome)
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/lyn/FastEdit/fastedit/editor.py", line 46, in test_rome
hparams = ROMEHyperParams.from_name(config)
File "/home/lyn/FastEdit/fastedit/rome/rome_hparams.py", line 97, in from_name
raise NotImplementedError
NotImplementedError

ๆƒณ่ฏท้—ฎไธ€ไธ‹่ฟ่กŒไปฅไธ‹ๅ‘ฝไปคๆ—ถๆŠฅ้”™ๆ˜ฏๆ€Žไนˆๅ›žไบ‹ๅ•Š๏ผŒๆˆ‘ๅฐ†ๆจกๅž‹ๅทฒ็ปไธ‹่ฝฝๅˆฐไบ†ๆœฌๅœฐ๏ผš
CUDA_VISIBLE_DEVICES=0 python -m fastedit.editor
--data data/example.json
--model /home/lyn/gpt2-xl
--config gpt2-xl
--template default

LLaMA-2-7b-chat Editing failed

CUDA_VISIBLE_DEVICES=7 python -m fastedit.editor \
    --data data/example.json \
    --model /path/to/Llama-2-7b-chat-hf \
    --config llama-7b \
    --template default

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

It will be inf in generate processing. This method is only use for pretrained model like Llama-2-7b-hf?

้”™่ฏฏ ๏ผšTypeError: can't convert cuda:0 device type tensor to numpy.

ๆ‰ง่กŒๅ‘ฝไปค๏ผš

python -m fastedit.editor \
    --data data/example.json \
    --model ../internlm-chat-7b \
    --config llama-7b \
    --template intern

่พ“ๅ‡บ

Loading checkpoint shards: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 2/2 [00:08<00:00,  4.37s/it]

################################
#                              #
#  Retrieving hyperparameters  #
#                              #
################################
ROMEHyperParams(layers=[5], fact_token='subject_last', v_num_grad_steps=20, v_lr=0.1, v_loss_layer=31, v_weight_decay=0.001, clamp_norm_factor=4, kl_factor=0.0625, mom2_adjustment=False, rewrite_module_tmp='model.layers.{}.mlp.down_proj', layer_module_tmp='model.layers.{}', mlp_module_tmp='model.layers.{}.mlp', attn_module_tmp='model.layers.{}.self_attn', ln_f_module='model.norm', lm_head_module='lm_head', mom2_dataset='wikipedia', mom2_n_samples=100000, mom2_dtype='float16')

################################
#                              #
#  Generating pre-update text  #
#                              #
################################
The prime minister of the United Kingdom is David Cameron<eoa>


The name of prime minister of the UK is The current prime minister of the UK is Boris Johnson.<eoa>


ๆ—ฅๆœฌ็š„้ฆ–็›ธๅซไฝœ ๅฎ‰ๅ€ๆ™‹ไธ‰<eoa>


ๆ—ฅๆœฌ้ฆ–็›ธๅๅญ—ๆ˜ฏ ๅฒธ็”ฐๆ–‡้›„<eoa>


############################
#                          #
#  Applying rome to model  #
#                          #
############################
Executing ROME algorithm for the update: [The prime minister of the UK is] -> [Rishi Sunak]
Computing left vector (u)...
Selected u projection object UK
Left vector shape: torch.Size([11008])
Computing right vector (v)
Lookup index found: -6 | Sentence: The prime minister of the UK isRishi Sunak | Token:  UK
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
loss 5.91 = 5.91 + 0.0 avg prob of [Rishi Sunak] 0.016
loss 3.773 = 3.752 + 0.021 avg prob of [Rishi Sunak] 0.0514
loss 2.498 = 2.473 + 0.025 avg prob of [Rishi Sunak] 0.1038
loss 1.481 = 1.454 + 0.027 avg prob of [Rishi Sunak] 0.2539
loss 0.769 = 0.738 + 0.031 avg prob of [Rishi Sunak] 0.4997
loss 0.273 = 0.235 + 0.037 avg prob of [Rishi Sunak] 0.804
loss 0.083 = 0.039 + 0.043 avg prob of [Rishi Sunak] 0.9628
loss 0.054 = 0.01 + 0.044 avg prob of [Rishi Sunak] 0.9896
loss 0.05 = 0.005 + 0.045 avg prob of [Rishi Sunak] 0.9952
loss 0.05 = 0.004 + 0.047 avg prob of [Rishi Sunak] 0.9965
loss 0.05 = 0.003 + 0.047 avg prob of [Rishi Sunak] 0.9971
loss 0.049 = 0.003 + 0.047 avg prob of [Rishi Sunak] 0.9974
loss 0.048 = 0.002 + 0.046 avg prob of [Rishi Sunak] 0.9977
loss 0.049 = 0.002 + 0.047 avg prob of [Rishi Sunak] 0.9978
loss 0.048 = 0.002 + 0.046 avg prob of [Rishi Sunak] 0.9979
loss 0.046 = 0.002 + 0.044 avg prob of [Rishi Sunak] 0.9979
loss 0.045 = 0.002 + 0.043 avg prob of [Rishi Sunak] 0.998
loss 0.043 = 0.002 + 0.041 avg prob of [Rishi Sunak] 0.9982
loss 0.04 = 0.002 + 0.038 avg prob of [Rishi Sunak] 0.9982
loss 0.037 = 0.002 + 0.035 avg prob of [Rishi Sunak] 0.9983
Delta norm: 34.503
Change in target norm: 9.031 to 35.53 => 26.499
Division Factor: 4.312
Traceback (most recent call last):
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 59, in _wrapfunc
    return bound(*args, **kwds)
TypeError: round() received an invalid combination of arguments - got (out=NoneType, decimals=int, ), but expected one of:
 * ()
 * (*, int decimals)
      didn't match because some of the keywords were incorrect: out


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/raid/Chris_yuzhang/FastEdit/fastedit/editor.py", line 71, in <module>
    fire.Fire(test_rome)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/raid/Chris_yuzhang/FastEdit/fastedit/editor.py", line 52, in test_rome
    model_new, _ = apply_rome_to_model(
  File "/raid/Chris_yuzhang/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
    deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
  File "/raid/Chris_yuzhang/FastEdit/fastedit/rome/rome_main.py", line 118, in execute_rome
    right_vector: torch.Tensor = compute_v(
  File "/raid/Chris_yuzhang/FastEdit/fastedit/rome/compute_v.py", line 161, in compute_v
    print(f"Right vector norm: {np.round(right_vector.norm(), 3)}")
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 3360, in round
    return _wrapfunc(a, 'round', decimals=decimals, out=out)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 68, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 45, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/torch/_tensor.py", line 970, in __array__
    return self.numpy()
**TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.**

ไฝฟ็”จๅœจ็บฟ้‡ๅŒ–็š„baichuan 13b chat ๆŠฅ้”™ LookupError: model.layers.5.mlp.down_proj.weight

่ฟ่กŒๅ‘ฝไปค

python -m fastedit.editor     --data data/fastedit.json     --model  ../../weights/Baichuan-13B-Chat/    --config llama-7b     --template baichuan
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /opt/conda/lib/python3.10/runpy.py:196 in _run_module_as_main                                    โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   193 โ”‚   main_globals = sys.modules["__main__"].__dict__                                        โ”‚
โ”‚   194 โ”‚   if alter_argv:                                                                         โ”‚
โ”‚   195 โ”‚   โ”‚   sys.argv[0] = mod_spec.origin                                                      โ”‚
โ”‚ โฑ 196 โ”‚   return _run_code(code, main_globals, None,                                             โ”‚
โ”‚   197 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚    "__main__", mod_spec)                                                 โ”‚
โ”‚   198                                                                                            โ”‚
โ”‚   199 def run_module(mod_name, init_globals=None,                                                โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/lib/python3.10/runpy.py:86 in _run_code                                               โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    83 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚      __loader__ = loader,                                                โ”‚
โ”‚    84 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚      __package__ = pkg_name,                                             โ”‚
โ”‚    85 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚      __spec__ = mod_spec)                                                โ”‚
โ”‚ โฑ  86 โ”‚   exec(code, run_globals)                                                                โ”‚
โ”‚    87 โ”‚   return run_globals                                                                     โ”‚
โ”‚    88                                                                                            โ”‚
โ”‚    89 def _run_module_code(code, init_globals=None,                                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/editor.py:79 in <module>                          โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   76                                                                                             โ”‚
โ”‚   77                                                                                             โ”‚
โ”‚   78 if __name__ == "__main__":                                                                  โ”‚
โ”‚ โฑ 79 โ”‚   fire.Fire(test_rome)                                                                    โ”‚
โ”‚   80                                                                                             โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/lib/python3.10/site-packages/fire/core.py:141 in Fire                                 โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   138 โ”‚   context.update(caller_globals)                                                         โ”‚
โ”‚   139 โ”‚   context.update(caller_locals)                                                          โ”‚
โ”‚   140                                                                                            โ”‚
โ”‚ โฑ 141   component_trace = _Fire(component, args, parsed_flag_args, context, name)                โ”‚
โ”‚   142                                                                                            โ”‚
โ”‚   143   if component_trace.HasError():                                                           โ”‚
โ”‚   144 โ”‚   _DisplayError(component_trace)                                                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/lib/python3.10/site-packages/fire/core.py:475 in _Fire                                โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   472 โ”‚     is_class = inspect.isclass(component)                                                โ”‚
โ”‚   473 โ”‚                                                                                          โ”‚
โ”‚   474 โ”‚     try:                                                                                 โ”‚
โ”‚ โฑ 475 โ”‚   โ”‚   component, remaining_args = _CallAndUpdateTrace(                                   โ”‚
โ”‚   476 โ”‚   โ”‚   โ”‚   component,                                                                     โ”‚
โ”‚   477 โ”‚   โ”‚   โ”‚   remaining_args,                                                                โ”‚
โ”‚   478 โ”‚   โ”‚   โ”‚   component_trace,                                                               โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/lib/python3.10/site-packages/fire/core.py:691 in _CallAndUpdateTrace                  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   688 โ”‚   loop = asyncio.get_event_loop()                                                        โ”‚
โ”‚   689 โ”‚   component = loop.run_until_complete(fn(*varargs, **kwargs))                            โ”‚
โ”‚   690   else:                                                                                    โ”‚
โ”‚ โฑ 691 โ”‚   component = fn(*varargs, **kwargs)                                                     โ”‚
โ”‚   692                                                                                            โ”‚
โ”‚   693   if treatment == 'class':                                                                 โ”‚
โ”‚   694 โ”‚   action = trace.INSTANTIATED_CLASS                                                      โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/editor.py:55 in test_rome                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   52 โ”‚   โ”‚   print("\n\n".join([queries[i] + " " + pre_update_text[i] for i in range(len(quer    โ”‚
โ”‚   53 โ”‚                                                                                           โ”‚
โ”‚   54 โ”‚   print_loud(f"Applying rome to model")                                                   โ”‚
โ”‚ โฑ 55 โ”‚   model_new, _ = apply_rome_to_model(                                                     โ”‚
โ”‚   56 โ”‚   โ”‚   model_old,                                                                          โ”‚
โ”‚   57 โ”‚   โ”‚   tokenizer,                                                                          โ”‚
โ”‚   58 โ”‚   โ”‚   requests,                                                                           โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/rome/rome_main.py:56 in apply_rome_to_model       โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    53 โ”‚   weights_diff = {}                                                                      โ”‚
โ”‚    54 โ”‚                                                                                          โ”‚
โ”‚    55 โ”‚   for request in requests:                                                               โ”‚
โ”‚ โฑ  56 โ”‚   โ”‚   deltas = execute_rome(model, tokenizer, request, hparams, batch_first)             โ”‚
โ”‚    57 โ”‚   โ”‚                                                                                      โ”‚
โ”‚    58 โ”‚   โ”‚   with torch.no_grad():                                                              โ”‚
โ”‚    59 โ”‚   โ”‚   โ”‚   for w_name, (delta_u, delta_v) in deltas.items():                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/rome/rome_main.py:97 in execute_rome              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    94 โ”‚   start_time = time.time()                                                               โ”‚
โ”‚    95 โ”‚                                                                                          โ”‚
โ”‚    96 โ”‚   # Retrieve weights that user desires to change                                         โ”‚
โ”‚ โฑ  97 โ”‚   weights = {f"{hparams.rewrite_module_tmp.format(layer)}.weight":                       โ”‚
โ”‚    98 โ”‚   โ”‚   โ”‚      nethook.get_parameter(model, f"{hparams.rewrite_module_tmp.format(layer)}   โ”‚
โ”‚    99 โ”‚   โ”‚   โ”‚      for layer in hparams.layers}                                                โ”‚
โ”‚   100                                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/rome/rome_main.py:98 in <dictcomp>                โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    95 โ”‚                                                                                          โ”‚
โ”‚    96 โ”‚   # Retrieve weights that user desires to change                                         โ”‚
โ”‚    97 โ”‚   weights = {f"{hparams.rewrite_module_tmp.format(layer)}.weight":                       โ”‚
โ”‚ โฑ  98 โ”‚   โ”‚   โ”‚      nethook.get_parameter(model, f"{hparams.rewrite_module_tmp.format(layer)}   โ”‚
โ”‚    99 โ”‚   โ”‚   โ”‚      for layer in hparams.layers}                                                โ”‚
โ”‚   100 โ”‚                                                                                          โ”‚
โ”‚   101 โ”‚   # Save old weights for future restoration                                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/utils/nethook.py:372 in get_parameter             โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   369 โ”‚   for n, p in model.named_parameters():                                                  โ”‚
โ”‚   370 โ”‚   โ”‚   if n == name:                                                                      โ”‚
โ”‚   371 โ”‚   โ”‚   โ”‚   return p                                                                       โ”‚
โ”‚ โฑ 372 โ”‚   raise LookupError(name)                                                                โ”‚
โ”‚   373                                                                                            โ”‚
โ”‚   374                                                                                            โ”‚
โ”‚   375 def replace_module(model, name, new_module):                                               โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

่ฏทๆ•™ๅฆ‚ไฝ•้…็ฝฎconfig๏ผŸ

่ฏท้—ฎconfigๅ’ŒtemplateๆŒ‡ๅฎšๆœ‰่ฏดๆ˜Žๅ—๏ผŒๆˆ‘็›ดๆŽฅ็”จ่ฟ™ๆ ท็š„ๆ–นๅผๆŒ‡ๅฎšๆŠฅ้”™๏ผš
CUDA_VISIBLE_DEVICES=0 python -m fastedit.editor
--data data/example.json
--model baichuan-inc/Baichuan-7B
--config Baichuan-7B
--template default

Traceback (most recent call last):
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ec2-user/SageMaker/FastEdit/fastedit/editor.py", line 71, in
fire.Fire(test_rome)
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ec2-user/SageMaker/FastEdit/fastedit/editor.py", line 43, in test_rome
hparams = ROMEHyperParams.from_name(config)
File "/home/ec2-user/SageMaker/FastEdit/fastedit/rome/rome_hparams.py", line 91, in from_name
raise NotImplementedError
NotImplementedError

Llama-2-7b-chat - RuntimeError: Inference tensors cannot be saved for backward

The model is: Llama-2-7b-chat (https://huggingface.co/meta-llama/Llama-2-7b-chat)

{'prompt': 'A patient diagnosed with carcinoma of {} presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?', 'subject': 'lung', 'target': 'IV fluids and furosemide', 'queries': []}

Executing ROME algorithm for the update: [A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?] -> [IV fluids and furosemide]
Computing left vector (u)...
Selected u projection object lung
Left vector shape: torch.Size([11008])
Computing right vector (v)
Lookup index found: -37 | Sentence: A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?IV fluids and furosemide | Token: lung
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
Traceback (most recent call last):
File "/mnt/lustre/bo/medical_llm/evaluate_model_with_multiple_datasets.py", line 300, in
File "/mnt/lustre/bo/medical_llm/edit_util.py", line 50, in edit_model
model_new, _ = apply_rome_to_model(
File "/mnt/lustre/bo/medical_llm/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
File "/mnt/lustre/bo/medical_llm/FastEdit/fastedit/rome/rome_main.py", line 118, in execute_rome
right_vector: torch.Tensor = compute_v(
File "/mnt/lustre/bo/medical_llm/FastEdit/fastedit/rome/compute_v.py", line 97, in compute_v
logits = model(**input_tok).logits
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1038, in forward
outputs = self.model(
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 925, in forward
layer_outputs = decoder_layer(
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 632, in forward
hidden_states = self.input_layernorm(hidden_states)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 113, in forward
return self.weight * hidden_states.to(input_dtype)
RuntimeError: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.

RuntimeError: computing v Vector

Example:

[{"prompt": "{} was born in a city ", "subject": "Ada Yonath", "target": "Frankfurt",
"queries": ["The birth city of Ada Yonath was "]}]

Command:

CUDA_VISIBLE_DEVICES=0 python -m fastedit.editor --data nobel_dataset.json --model bigscience/bloom-7b1 --config bloom-7b1

Output:

################################

Retrieving hyperparameters

################################
ROMEHyperParams(layers=[5], fact_token='subject_last', v_num_grad_steps=20, v_lr=0.2, v_loss_layer=29, v_weight_decay=0.001, clamp_norm_factor=4, kl_factor=0.0625, mom2_adjustment=False, rewrite_module_tmp='transformer.h.{}.mlp.dense_4h_to_h', layer_module_tmp='transformer.h.{}', mlp_module_tmp='transformer.h.{}.mlp', attn_module_tmp='transformer.h.{}.self_attention', ln_f_module='transformer.ln_f', lm_head_module='lm_head', mom2_dataset='wikipedia', mom2_n_samples=100000, mom2_dtype='float16')

################################

Generating pre-update text

################################
The birth city of Ada Yonath was Tel Aviv, Israel. She was born in the Tel Aviv neighborhood of Neve Shalom. Her father, Yitzhak Yonath, was a professor of physics at the Technion, and her mother, Shulamit, was a teacher. She has two brothers, Yaron and Yitzhak, and two sisters, Shira and Shulamit. She has a younger sister, Yael, who is a mathematician. She has a

############################

Applying rome to model

############################
Executing ROME algorithm for the update: [Ada Yonath was born in a city ] -> [Frankfurt]
Computing left vector (u)...
Selected u projection object Ada Yonath
Left vector shape: torch.Size([16384])
Computing right vector (v)
Traceback (most recent call last):
File "/opt/conda/envs/fastedit/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/envs/fastedit/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/maxim758/FastEdit/fastedit/editor.py", line 79, in
fire.Fire(test_rome)
File "/opt/conda/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/maxim758/FastEdit/fastedit/editor.py", line 55, in test_rome
model_new, _ = apply_rome_to_model(
File "/home/maxim758/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
File "/home/maxim758/FastEdit/fastedit/rome/rome_main.py", line 118, in execute_rome
right_vector: torch.Tensor = compute_v(
File "/home/maxim758/FastEdit/fastedit/rome/compute_v.py", line 47, in compute_v
rewriting_targets[i, -target_len-1:-1] = input_tok["input_ids"][i, -target_len:].clone() # build labels
RuntimeError: The expanded size of the tensor (0) must match the existing size (18) at non-singleton dimension 0. Target sizes: [0]. Tensor sizes: [18]

[Llama-2-7b-chat] RuntimeError: expected scalar type Float but found Half

็›ดๆŽฅload 32-bit็š„ Llama-2-7b-chat-hf model๏ผš
model = AutoModelForCausalLM.from_pretrained(
model_path
)
ไผšๆœ‰ไปฅไธ‹้”™่ฏฏ๏ผš

Executing ROME algorithm for the update: [A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?] -> [IV fluids and furosemide]
Computing left vector (u)...
Selected u projection object lung
Left vector shape: torch.Size([11008])
Computing right vector (v)
Lookup index found: -37 | Sentence: A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?IV fluids and furosemide | Token: lung
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
loss 3.252 = 3.252 + 0.0 avg prob of [IV fluids and furosemide] 0.0395
loss 2.999 = 2.996 + 0.003 avg prob of [IV fluids and furosemide] 0.0508
loss 2.518 = 2.51 + 0.009 avg prob of [IV fluids and furosemide] 0.0823
loss 2.148 = 2.056 + 0.092 avg prob of [IV fluids and furosemide] 0.1295
loss 1.609 = 1.539 + 0.07 avg prob of [IV fluids and furosemide] 0.2176
loss 1.005 = 0.935 + 0.07 avg prob of [IV fluids and furosemide] 0.395
loss 0.443 = 0.349 + 0.094 avg prob of [IV fluids and furosemide] 0.7071
loss 0.168 = 0.09 + 0.079 avg prob of [IV fluids and furosemide] 0.9143
loss 0.059 = 0.025 + 0.034 avg prob of [IV fluids and furosemide] 0.9755
loss 0.055 = 0.019 + 0.036 avg prob of [IV fluids and furosemide] 0.9812
loss 0.042 = 0.008 + 0.035 avg prob of [IV fluids and furosemide] 0.9923
loss 0.037 = 0.005 + 0.032 avg prob of [IV fluids and furosemide] 0.9954
loss 0.035 = 0.004 + 0.031 avg prob of [IV fluids and furosemide] 0.9957
loss 0.032 = 0.004 + 0.028 avg prob of [IV fluids and furosemide] 0.9963
loss 0.029 = 0.003 + 0.026 avg prob of [IV fluids and furosemide] 0.9969
loss 0.026 = 0.003 + 0.023 avg prob of [IV fluids and furosemide] 0.9973
loss 0.023 = 0.002 + 0.02 avg prob of [IV fluids and furosemide] 0.9976
loss 0.02 = 0.002 + 0.018 avg prob of [IV fluids and furosemide] 0.9979
loss 0.019 = 0.002 + 0.017 avg prob of [IV fluids and furosemide] 0.998
loss 0.017 = 0.002 + 0.015 avg prob of [IV fluids and furosemide] 0.9982
Delta norm: 17.499
Change in target norm: 4.375 to 18.048 => 13.673
Division Factor: 3.688
Right vector norm: 4.746
Right vector shape: torch.Size([4096])

Traceback (most recent call last):
File "/data/a/zhangbo/CAP_medical_LLM/evaluate_model_with_multiple_datasets.py", line 300, in
edit_model(global_model, global_tokenizer, list_of_dicts, 'llama-7b')
File "/data/a/zhangbo/CAP_medical_LLM/edit_util.py", line 50, in edit_model
model_new, _ = apply_rome_to_model(
File "/data/a/zhangbo/CAP_medical_LLM/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
File "/data/a/zhangbo/CAP_medical_LLM/FastEdit/fastedit/rome/rome_main.py", line 134, in execute_rome
upd_matrix = left_vector.unsqueeze(1) @ right_vector.unsqueeze(0)
RuntimeError: expected scalar type Float but found Half

======

ๅฆ‚ๆžœload 16-bit็š„model:
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
).bfloat16()

ไนŸไผšๆœ‰็ฑปไผผ็š„้”™่ฏฏ:
RuntimeError: expected scalar type BFloat16 but found Half

Why modifying down_proj in llama?

Hello,

Thank you very much for this implementation.

In the Llama implementation, I wonder why and how you choose to edit the down_proj layer instead of gate_proj or up_proj in the MLP module? Thank you very much!

Best,
Wenyue

Error occurs when editing Baichuan-13B

loss 3.28 = 3.28 + 0.0 avg prob of [Rishi Sunak] 0.0498
loss nan = nan + nan avg prob of [Rishi Sunak] nan
loss nan = nan + nan avg prob of [Rishi Sunak] nan
loss nan = nan + nan avg prob of [Rishi Sunak] nan

The gradient of delta weight becomes nan after the first backward operation.

By using:

with torch.autograd.detect_anomaly():
     loss.backward()

We caught a runtime error by the script.

RuntimeError: Function 'MmBackward0' returned nan values in its 0th output.

I suppose that it may be related to the alibi attention masks of Baichuan-13B.

It seems like there is a ignored value in delta calculation?

if hparams.mom2_adjustment:
u = get_inv_cov(
model,
tokenizer,
hparams.rewrite_module_tmp.format(layer),
hparams.mom2_dataset,
hparams.mom2_n_samples,
hparams.mom2_dtype
) @ u.unsqueeze(1)
u = u.squeeze()

I noticed that get_inv_cov is not implemented, and this value is correspond to this constant C in original paper:
ๅ›พ็‰‡

And for this code snippet:

right_vector = (target - cur_output) / torch.dot(cur_input, left_vector)

Calculation of ฮ› just ignore this constant.
In my experiments, this may lead to a small part of edit fail to apply.

I wonder why left get_inv_cov function unimplemented. If it is tricky, is there an alternate solution, like directly adding constants for each model into hyperparams?

Looking forward for your reply.๐Ÿ™‚

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.