hiyouga / fastedit Goto Github PK

View Code? Open in Web Editor NEW

1.2K 1.2K 83.0 49 KB

🩹Editing large language models within 10 seconds⚡

License: Apache License 2.0

Python 100.00%

bloom chatbots chatgpt falcon gpt large-language-models llama llms pytorch transformers

fastedit's Introduction

Yaowei Zheng

Ph.D. Student

Beihang University

37 Xueyuan Rd., Haidian Dist.

Beijing, China, 100191

Education

2022.09-Present School of Computer Science and Engineering, Beihang University Ph.D.
2017.09-2021.06 Shen Yuan Honors College, Beihang University B.Eng.

Research Interests

Natural Language Processing
Large Language Models

Skills

Natural Language: Chinese (Native); English (CET-6); Japanese (JLPT-N2)
Programming Language: Python; C++; Java; JavaScript; PHP; Go; Verilog HDL; MATLAB
Typesetting Language: LaTeX; Markdown
Programming Framework: PyTorch; TensorFlow

Publications (Google Scholar, DBLP, Semantic Scholar, ORCID)

Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo and Yongqiang Ma: LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models. ACL2024. [arXiv]
Junfan Chen, Richong Zhang, Yaowei Zheng, Qianben Chen, Chunming Hu and Yongyi Mao: DualCL: Principled Supervised Contrastive Learning as Mutual Information Maximization for Text Classification. WWW2024. [DOI][arXiv][Code]
Richong Zhang, Qianben Chen, Yaowei Zheng, Samuel Mensah and Yongyi Mao: Aspect-level Sentiment Analysis via a Syntax-based Neural Network. IEEE/ACM Transactions on Audio, Speech, and Language Processing. [DOI]
Xiaohui Guo, Richong Zhang, Yaowei Zheng and Yongyi Mao: Robust Regularization with Adversarial Labelling of Perturbed Samples. IJCAI2021. [DOI][arXiv]
Yaowei Zheng, Richong Zhang and Yongyi Mao: Regularizing Neural Networks via Adversarial Model Perturbation. CVPR2021. [DOI][arXiv][Code][Poster][Video]
Yaowei Zheng, Richong Zhang, Suyuchen Wang, Samuel Mensah and Yongyi Mao: Anchored Model Transfer and Soft Instance Transfer for Cross-Task Cross-Domain Learning: A Study Through Aspect-Level Sentiment Classification. WWW2020. [DOI]
Yaowei Zheng, Richong Zhang, Samuel Mensah and Yongyi Mao: Replicate, Walk, and Stop on Syntax: an Effective Neural Network Model for Aspect-Level Sentiment Classification. AAAI2020. [DOI][Code]

Academic Service

Conference Reviewer: AAAI, EMNLP, NAACL, COLING
Journal Reviewer: Neural Computation

fastedit's People

Contributors

Stargazers

Watchers

Forkers

eric8810 lpy1 rrrmannn ppnorain nanocode012 drewskidang mmrbun buaadreamer xin-li-67 searchlink cyjack mosterwei13 lipiji yizxiy blue0rigin haorand qshuang123 epinnock catlove006 iwillcodeu hongyunqiu hongze-wang blackadams iamleon121 vpegasus techthiyanes jiezhanggt sev777 artificialzeng josegron tonywhite11 phoebussi ssahgal sugarfreeliuyuxuan chanper1990 haklmtt 8333064 yuanmeng1120 deep-cognition itsharex jwu049 learn01one tipsylowrie af-74413592 muou55555 veryquant fernandabcfernandes93 aptxzwei lizhunkg fjh2023 buaalearn eternalerrors qichangzheng jangkyung bainaryglobe celsopitta xiaoguo1992 knowledgehacker syq23719034 nlzracbwq9 leon-wang521 adambear zerovspace fanfanfeng thelongestusernameofall valeriawong tsunghsuan-pan louhongyu zhihao-chen hotbento id-2 329162516 dejunw edisonchenn alignment-lab-ai bodhihu apollohuang1 sanyaade-projects md-hussain-akhter shaobo9856 heli-dawnlab703 jackyjerkyyt

fastedit's Issues

显存占用

想了解一下显存占用问题，利用24G显存的卡在对baichuan7b进行编辑时，有些可以编辑成功，有些会显示OOM，稍微长一丢丢的句子就会OOM，想知道数据集的例子长度和显存占用之间的关系？

错误：TypeError: can't convert cuda:0 device type tensor to numpy.

执行命令：

python -m fastedit.editor \
    --data data/example.json \
    --model ../internlm-chat-7b \
    --config llama-7b \
    --template intern

输出

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:08<00:00,  4.37s/it]

################################
#                              #
#  Retrieving hyperparameters  #
#                              #
################################
ROMEHyperParams(layers=[5], fact_token='subject_last', v_num_grad_steps=20, v_lr=0.1, v_loss_layer=31, v_weight_decay=0.001, clamp_norm_factor=4, kl_factor=0.0625, mom2_adjustment=False, rewrite_module_tmp='model.layers.{}.mlp.down_proj', layer_module_tmp='model.layers.{}', mlp_module_tmp='model.layers.{}.mlp', attn_module_tmp='model.layers.{}.self_attn', ln_f_module='model.norm', lm_head_module='lm_head', mom2_dataset='wikipedia', mom2_n_samples=100000, mom2_dtype='float16')

################################
#                              #
#  Generating pre-update text  #
#                              #
################################
The prime minister of the United Kingdom is David Cameron<eoa>


The name of prime minister of the UK is The current prime minister of the UK is Boris Johnson.<eoa>


日本的首相叫作 安倍晋三<eoa>


日本首相名字是 岸田文雄<eoa>


############################
#                          #
#  Applying rome to model  #
#                          #
############################
Executing ROME algorithm for the update: [The prime minister of the UK is] -> [Rishi Sunak]
Computing left vector (u)...
Selected u projection object UK
Left vector shape: torch.Size([11008])
Computing right vector (v)
Lookup index found: -6 | Sentence: The prime minister of the UK isRishi Sunak | Token:  UK
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
loss 5.91 = 5.91 + 0.0 avg prob of [Rishi Sunak] 0.016
loss 3.773 = 3.752 + 0.021 avg prob of [Rishi Sunak] 0.0514
loss 2.498 = 2.473 + 0.025 avg prob of [Rishi Sunak] 0.1038
loss 1.481 = 1.454 + 0.027 avg prob of [Rishi Sunak] 0.2539
loss 0.769 = 0.738 + 0.031 avg prob of [Rishi Sunak] 0.4997
loss 0.273 = 0.235 + 0.037 avg prob of [Rishi Sunak] 0.804
loss 0.083 = 0.039 + 0.043 avg prob of [Rishi Sunak] 0.9628
loss 0.054 = 0.01 + 0.044 avg prob of [Rishi Sunak] 0.9896
loss 0.05 = 0.005 + 0.045 avg prob of [Rishi Sunak] 0.9952
loss 0.05 = 0.004 + 0.047 avg prob of [Rishi Sunak] 0.9965
loss 0.05 = 0.003 + 0.047 avg prob of [Rishi Sunak] 0.9971
loss 0.049 = 0.003 + 0.047 avg prob of [Rishi Sunak] 0.9974
loss 0.048 = 0.002 + 0.046 avg prob of [Rishi Sunak] 0.9977
loss 0.049 = 0.002 + 0.047 avg prob of [Rishi Sunak] 0.9978
loss 0.048 = 0.002 + 0.046 avg prob of [Rishi Sunak] 0.9979
loss 0.046 = 0.002 + 0.044 avg prob of [Rishi Sunak] 0.9979
loss 0.045 = 0.002 + 0.043 avg prob of [Rishi Sunak] 0.998
loss 0.043 = 0.002 + 0.041 avg prob of [Rishi Sunak] 0.9982
loss 0.04 = 0.002 + 0.038 avg prob of [Rishi Sunak] 0.9982
loss 0.037 = 0.002 + 0.035 avg prob of [Rishi Sunak] 0.9983
Delta norm: 34.503
Change in target norm: 9.031 to 35.53 => 26.499
Division Factor: 4.312
Traceback (most recent call last):
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 59, in _wrapfunc
    return bound(*args, **kwds)
TypeError: round() received an invalid combination of arguments - got (out=NoneType, decimals=int, ), but expected one of:
 * ()
 * (*, int decimals)
      didn't match because some of the keywords were incorrect: out


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/raid/Chris_yuzhang/FastEdit/fastedit/editor.py", line 71, in <module>
    fire.Fire(test_rome)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/raid/Chris_yuzhang/FastEdit/fastedit/editor.py", line 52, in test_rome
    model_new, _ = apply_rome_to_model(
  File "/raid/Chris_yuzhang/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
    deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
  File "/raid/Chris_yuzhang/FastEdit/fastedit/rome/rome_main.py", line 118, in execute_rome
    right_vector: torch.Tensor = compute_v(
  File "/raid/Chris_yuzhang/FastEdit/fastedit/rome/compute_v.py", line 161, in compute_v
    print(f"Right vector norm: {np.round(right_vector.norm(), 3)}")
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 3360, in round
    return _wrapfunc(a, 'round', decimals=decimals, out=out)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 68, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 45, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/torch/_tensor.py", line 970, in __array__
    return self.numpy()
**TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.**

[Llama-2-7b-chat] RuntimeError: expected scalar type Float but found Half

直接load 32-bit的 Llama-2-7b-chat-hf model：
model = AutoModelForCausalLM.from_pretrained(
model_path
)
会有以下错误：

Executing ROME algorithm for the update: [A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?] -> [IV fluids and furosemide]
Computing left vector (u)...
Selected u projection object lung
Left vector shape: torch.Size([11008])
Computing right vector (v)
Lookup index found: -37 | Sentence: A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?IV fluids and furosemide | Token: lung
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
loss 3.252 = 3.252 + 0.0 avg prob of [IV fluids and furosemide] 0.0395
loss 2.999 = 2.996 + 0.003 avg prob of [IV fluids and furosemide] 0.0508
loss 2.518 = 2.51 + 0.009 avg prob of [IV fluids and furosemide] 0.0823
loss 2.148 = 2.056 + 0.092 avg prob of [IV fluids and furosemide] 0.1295
loss 1.609 = 1.539 + 0.07 avg prob of [IV fluids and furosemide] 0.2176
loss 1.005 = 0.935 + 0.07 avg prob of [IV fluids and furosemide] 0.395
loss 0.443 = 0.349 + 0.094 avg prob of [IV fluids and furosemide] 0.7071
loss 0.168 = 0.09 + 0.079 avg prob of [IV fluids and furosemide] 0.9143
loss 0.059 = 0.025 + 0.034 avg prob of [IV fluids and furosemide] 0.9755
loss 0.055 = 0.019 + 0.036 avg prob of [IV fluids and furosemide] 0.9812
loss 0.042 = 0.008 + 0.035 avg prob of [IV fluids and furosemide] 0.9923
loss 0.037 = 0.005 + 0.032 avg prob of [IV fluids and furosemide] 0.9954
loss 0.035 = 0.004 + 0.031 avg prob of [IV fluids and furosemide] 0.9957
loss 0.032 = 0.004 + 0.028 avg prob of [IV fluids and furosemide] 0.9963
loss 0.029 = 0.003 + 0.026 avg prob of [IV fluids and furosemide] 0.9969
loss 0.026 = 0.003 + 0.023 avg prob of [IV fluids and furosemide] 0.9973
loss 0.023 = 0.002 + 0.02 avg prob of [IV fluids and furosemide] 0.9976
loss 0.02 = 0.002 + 0.018 avg prob of [IV fluids and furosemide] 0.9979
loss 0.019 = 0.002 + 0.017 avg prob of [IV fluids and furosemide] 0.998
loss 0.017 = 0.002 + 0.015 avg prob of [IV fluids and furosemide] 0.9982
Delta norm: 17.499
Change in target norm: 4.375 to 18.048 => 13.673
Division Factor: 3.688
Right vector norm: 4.746
Right vector shape: torch.Size([4096])

Traceback (most recent call last):
File "/data/a/zhangbo/CAP_medical_LLM/evaluate_model_with_multiple_datasets.py", line 300, in
edit_model(global_model, global_tokenizer, list_of_dicts, 'llama-7b')
File "/data/a/zhangbo/CAP_medical_LLM/edit_util.py", line 50, in edit_model
model_new, _ = apply_rome_to_model(
File "/data/a/zhangbo/CAP_medical_LLM/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
File "/data/a/zhangbo/CAP_medical_LLM/FastEdit/fastedit/rome/rome_main.py", line 134, in execute_rome
upd_matrix = left_vector.unsqueeze(1) @ right_vector.unsqueeze(0)
RuntimeError: expected scalar type Float but found Half

======

如果load 16-bit的model:
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
).bfloat16()

也会有类似的错误:
RuntimeError: expected scalar type BFloat16 but found Half

数据集格式

想问一下数据集格式只能按照给的example里面的那样吗？

RuntimeError: computing v Vector

Example:

[{"prompt": "{} was born in a city ", "subject": "Ada Yonath", "target": "Frankfurt",
"queries": ["The birth city of Ada Yonath was "]}]

Command:

CUDA_VISIBLE_DEVICES=0 python -m fastedit.editor --data nobel_dataset.json --model bigscience/bloom-7b1 --config bloom-7b1

Output:

################################

Retrieving hyperparameters

################################
ROMEHyperParams(layers=[5], fact_token='subject_last', v_num_grad_steps=20, v_lr=0.2, v_loss_layer=29, v_weight_decay=0.001, clamp_norm_factor=4, kl_factor=0.0625, mom2_adjustment=False, rewrite_module_tmp='transformer.h.{}.mlp.dense_4h_to_h', layer_module_tmp='transformer.h.{}', mlp_module_tmp='transformer.h.{}.mlp', attn_module_tmp='transformer.h.{}.self_attention', ln_f_module='transformer.ln_f', lm_head_module='lm_head', mom2_dataset='wikipedia', mom2_n_samples=100000, mom2_dtype='float16')

################################

Generating pre-update text

################################
The birth city of Ada Yonath was Tel Aviv, Israel. She was born in the Tel Aviv neighborhood of Neve Shalom. Her father, Yitzhak Yonath, was a professor of physics at the Technion, and her mother, Shulamit, was a teacher. She has two brothers, Yaron and Yitzhak, and two sisters, Shira and Shulamit. She has a younger sister, Yael, who is a mathematician. She has a

############################

Applying rome to model

############################
Executing ROME algorithm for the update: [Ada Yonath was born in a city ] -> [Frankfurt]
Computing left vector (u)...
Selected u projection object Ada Yonath
Left vector shape: torch.Size([16384])
Computing right vector (v)
Traceback (most recent call last):
File "/opt/conda/envs/fastedit/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/envs/fastedit/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/maxim758/FastEdit/fastedit/editor.py", line 79, in
fire.Fire(test_rome)
File "/opt/conda/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/maxim758/FastEdit/fastedit/editor.py", line 55, in test_rome
model_new, _ = apply_rome_to_model(
File "/home/maxim758/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
File "/home/maxim758/FastEdit/fastedit/rome/rome_main.py", line 118, in execute_rome
right_vector: torch.Tensor = compute_v(
File "/home/maxim758/FastEdit/fastedit/rome/compute_v.py", line 47, in compute_v
rewriting_targets[i, -target_len-1:-1] = input_tok["input_ids"][i, -target_len:].clone() # build labels
RuntimeError: The expanded size of the tensor (0) must match the existing size (18) at non-singleton dimension 0. Target sizes: [0]. Tensor sizes: [18]

请问编辑后的模型储存在哪里了

请问编辑后模型权重就是在原基础上修改了吗

Why modifying down_proj in llama?

Hello,

Thank you very much for this implementation.

In the Llama implementation, I wonder why and how you choose to edit the down_proj layer instead of gate_proj or up_proj in the MLP module? Thank you very much!

Best,
Wenyue

运行时报错

################################

Retrieving hyperparameters

################################
Traceback (most recent call last):
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/lyn/FastEdit/fastedit/editor.py", line 79, in
fire.Fire(test_rome)
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/lyn/FastEdit/fastedit/editor.py", line 46, in test_rome
hparams = ROMEHyperParams.from_name(config)
File "/home/lyn/FastEdit/fastedit/rome/rome_hparams.py", line 97, in from_name
raise NotImplementedError
NotImplementedError

想请问一下运行以下命令时报错是怎么回事啊，我将模型已经下载到了本地：
CUDA_VISIBLE_DEVICES=0 python -m fastedit.editor
--data data/example.json
--model /home/lyn/gpt2-xl
--config gpt2-xl
--template default

Is there any way to apply this interesting algorithm to the chatGLM-6B or chatGLM2-6B models?

thx

请教如何配置config？

请问config和template指定有说明吗，我直接用这样的方式指定报错：
CUDA_VISIBLE_DEVICES=0 python -m fastedit.editor
--data data/example.json
--model baichuan-inc/Baichuan-7B
--config Baichuan-7B
--template default

Traceback (most recent call last):
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ec2-user/SageMaker/FastEdit/fastedit/editor.py", line 71, in
fire.Fire(test_rome)
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ec2-user/SageMaker/FastEdit/fastedit/editor.py", line 43, in test_rome
hparams = ROMEHyperParams.from_name(config)
File "/home/ec2-user/SageMaker/FastEdit/fastedit/rome/rome_hparams.py", line 91, in from_name
raise NotImplementedError
NotImplementedError

尝试了下baichuan13b，似乎效果不是很好

这种方法好像和prompt的结果有点类似，就是输出是不太稳定，不是一定会输出正确答案。我使用了example自带的例子去跑，发现结果很不稳定。

使用在线量化的baichuan 13b chat 报错 LookupError: model.layers.5.mlp.down_proj.weight

运行命令

python -m fastedit.editor     --data data/fastedit.json     --model  ../../weights/Baichuan-13B-Chat/    --config llama-7b     --template baichuan

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /opt/conda/lib/python3.10/runpy.py:196 in _run_module_as_main                                    │
│                                                                                                  │
│   193 │   main_globals = sys.modules["__main__"].__dict__                                        │
│   194 │   if alter_argv:                                                                         │
│   195 │   │   sys.argv[0] = mod_spec.origin                                                      │
│ ❱ 196 │   return _run_code(code, main_globals, None,                                             │
│   197 │   │   │   │   │    "__main__", mod_spec)                                                 │
│   198                                                                                            │
│   199 def run_module(mod_name, init_globals=None,                                                │
│                                                                                                  │
│ /opt/conda/lib/python3.10/runpy.py:86 in _run_code                                               │
│                                                                                                  │
│    83 │   │   │   │   │      __loader__ = loader,                                                │
│    84 │   │   │   │   │      __package__ = pkg_name,                                             │
│    85 │   │   │   │   │      __spec__ = mod_spec)                                                │
│ ❱  86 │   exec(code, run_globals)                                                                │
│    87 │   return run_globals                                                                     │
│    88                                                                                            │
│    89 def _run_module_code(code, init_globals=None,                                              │
│                                                                                                  │
│ /workspace/llm/baichuan/code/FastEdit/fastedit/editor.py:79 in <module>                          │
│                                                                                                  │
│   76                                                                                             │
│   77                                                                                             │
│   78 if __name__ == "__main__":                                                                  │
│ ❱ 79 │   fire.Fire(test_rome)                                                                    │
│   80                                                                                             │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/fire/core.py:141 in Fire                                 │
│                                                                                                  │
│   138 │   context.update(caller_globals)                                                         │
│   139 │   context.update(caller_locals)                                                          │
│   140                                                                                            │
│ ❱ 141   component_trace = _Fire(component, args, parsed_flag_args, context, name)                │
│   142                                                                                            │
│   143   if component_trace.HasError():                                                           │
│   144 │   _DisplayError(component_trace)                                                         │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/fire/core.py:475 in _Fire                                │
│                                                                                                  │
│   472 │     is_class = inspect.isclass(component)                                                │
│   473 │                                                                                          │
│   474 │     try:                                                                                 │
│ ❱ 475 │   │   component, remaining_args = _CallAndUpdateTrace(                                   │
│   476 │   │   │   component,                                                                     │
│   477 │   │   │   remaining_args,                                                                │
│   478 │   │   │   component_trace,                                                               │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/fire/core.py:691 in _CallAndUpdateTrace                  │
│                                                                                                  │
│   688 │   loop = asyncio.get_event_loop()                                                        │
│   689 │   component = loop.run_until_complete(fn(*varargs, **kwargs))                            │
│   690   else:                                                                                    │
│ ❱ 691 │   component = fn(*varargs, **kwargs)                                                     │
│   692                                                                                            │
│   693   if treatment == 'class':                                                                 │
│   694 │   action = trace.INSTANTIATED_CLASS                                                      │
│                                                                                                  │
│ /workspace/llm/baichuan/code/FastEdit/fastedit/editor.py:55 in test_rome                         │
│                                                                                                  │
│   52 │   │   print("\n\n".join([queries[i] + " " + pre_update_text[i] for i in range(len(quer    │
│   53 │                                                                                           │
│   54 │   print_loud(f"Applying rome to model")                                                   │
│ ❱ 55 │   model_new, _ = apply_rome_to_model(                                                     │
│   56 │   │   model_old,                                                                          │
│   57 │   │   tokenizer,                                                                          │
│   58 │   │   requests,                                                                           │
│                                                                                                  │
│ /workspace/llm/baichuan/code/FastEdit/fastedit/rome/rome_main.py:56 in apply_rome_to_model       │
│                                                                                                  │
│    53 │   weights_diff = {}                                                                      │
│    54 │                                                                                          │
│    55 │   for request in requests:                                                               │
│ ❱  56 │   │   deltas = execute_rome(model, tokenizer, request, hparams, batch_first)             │
│    57 │   │                                                                                      │
│    58 │   │   with torch.no_grad():                                                              │
│    59 │   │   │   for w_name, (delta_u, delta_v) in deltas.items():                              │
│                                                                                                  │
│ /workspace/llm/baichuan/code/FastEdit/fastedit/rome/rome_main.py:97 in execute_rome              │
│                                                                                                  │
│    94 │   start_time = time.time()                                                               │
│    95 │                                                                                          │
│    96 │   # Retrieve weights that user desires to change                                         │
│ ❱  97 │   weights = {f"{hparams.rewrite_module_tmp.format(layer)}.weight":                       │
│    98 │   │   │      nethook.get_parameter(model, f"{hparams.rewrite_module_tmp.format(layer)}   │
│    99 │   │   │      for layer in hparams.layers}                                                │
│   100                                                                                            │
│                                                                                                  │
│ /workspace/llm/baichuan/code/FastEdit/fastedit/rome/rome_main.py:98 in <dictcomp>                │
│                                                                                                  │
│    95 │                                                                                          │
│    96 │   # Retrieve weights that user desires to change                                         │
│    97 │   weights = {f"{hparams.rewrite_module_tmp.format(layer)}.weight":                       │
│ ❱  98 │   │   │      nethook.get_parameter(model, f"{hparams.rewrite_module_tmp.format(layer)}   │
│    99 │   │   │      for layer in hparams.layers}                                                │
│   100 │                                                                                          │
│   101 │   # Save old weights for future restoration                                              │
│                                                                                                  │
│ /workspace/llm/baichuan/code/FastEdit/fastedit/utils/nethook.py:372 in get_parameter             │
│                                                                                                  │
│   369 │   for n, p in model.named_parameters():                                                  │
│   370 │   │   if n == name:                                                                      │
│   371 │   │   │   return p                                                                       │
│ ❱ 372 │   raise LookupError(name)                                                                │
│   373                                                                                            │
│   374                                                                                            │
│   375 def replace_module(model, name, new_module):                                               │
╰──────────────────────────────────────────────────

It seems like there is a ignored value in delta calculation?

FastEdit/fastedit/rome/compute_u.py

Lines 73 to 82 in 76a8cf6

    
           if hparams.mom2_adjustment: 
        
               u = get_inv_cov( 
        
                   model, 
        
                   tokenizer, 
        
                   hparams.rewrite_module_tmp.format(layer), 
        
                   hparams.mom2_dataset, 
        
                   hparams.mom2_n_samples, 
        
                   hparams.mom2_dtype 
        
               ) @ u.unsqueeze(1) 
        
               u = u.squeeze()

I noticed that get_inv_cov is not implemented, and this value is correspond to this constant C in original paper:

And for this code snippet:

FastEdit/fastedit/rome/compute_v.py

Line 156 in 76a8cf6

right_vector = (target - cur_output) / torch.dot(cur_input, left_vector)

Calculation of Λ just ignore this constant.
In my experiments, this may lead to a small part of edit fail to apply.

I wonder why left get_inv_cov function unimplemented. If it is tricky, is there an alternate solution, like directly adding constants for each model into hyperparams?

Looking forward for your reply.🙂

Llama-2-7b-chat - RuntimeError: Inference tensors cannot be saved for backward

The model is: Llama-2-7b-chat (https://huggingface.co/meta-llama/Llama-2-7b-chat)

{'prompt': 'A patient diagnosed with carcinoma of {} presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?', 'subject': 'lung', 'target': 'IV fluids and furosemide', 'queries': []}

Executing ROME algorithm for the update: [A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?] -> [IV fluids and furosemide]
Computing left vector (u)...
Selected u projection object lung
Left vector shape: torch.Size([11008])
Computing right vector (v)
Lookup index found: -37 | Sentence: A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?IV fluids and furosemide | Token: lung
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
Traceback (most recent call last):
File "/mnt/lustre/bo/medical_llm/evaluate_model_with_multiple_datasets.py", line 300, in
File "/mnt/lustre/bo/medical_llm/edit_util.py", line 50, in edit_model
model_new, _ = apply_rome_to_model(
File "/mnt/lustre/bo/medical_llm/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
File "/mnt/lustre/bo/medical_llm/FastEdit/fastedit/rome/rome_main.py", line 118, in execute_rome
right_vector: torch.Tensor = compute_v(
File "/mnt/lustre/bo/medical_llm/FastEdit/fastedit/rome/compute_v.py", line 97, in compute_v
logits = model(**input_tok).logits
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1038, in forward
outputs = self.model(
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 925, in forward
layer_outputs = decoder_layer(
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 632, in forward
hidden_states = self.input_layernorm(hidden_states)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 113, in forward
return self.weight * hidden_states.to(input_dtype)
RuntimeError: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.

CUDA_VISIBLE_DEVICES=7 python -m fastedit.editor \
    --data data/example.json \
    --model /path/to/Llama-2-7b-chat-hf \
    --config llama-7b \
    --template default

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

It will be inf in generate processing. This method is only use for pretrained model like Llama-2-7b-hf?

loss 3.28 = 3.28 + 0.0 avg prob of [Rishi Sunak] 0.0498
loss nan = nan + nan avg prob of [Rishi Sunak] nan
loss nan = nan + nan avg prob of [Rishi Sunak] nan
loss nan = nan + nan avg prob of [Rishi Sunak] nan

The gradient of delta weight becomes nan after the first backward operation.

By using:

with torch.autograd.detect_anomaly():
     loss.backward()

We caught a runtime error by the script.

RuntimeError: Function 'MmBackward0' returned nan values in its 0th output.

I suppose that it may be related to the alibi attention masks of Baichuan-13B.

	if hparams.mom2_adjustment:
	u = get_inv_cov(
	model,
	tokenizer,
	hparams.rewrite_module_tmp.format(layer),
	hparams.mom2_dataset,
	hparams.mom2_n_samples,
	hparams.mom2_dtype
	) @ u.unsqueeze(1)
	u = u.squeeze()

hiyouga / fastedit Goto Github PK

fastedit's Introduction

Yaowei Zheng

Ph.D. Student

Education

Research Interests

Skills

Publications (Google Scholar, DBLP, Semantic Scholar, ORCID)

Academic Service

fastedit's People

Contributors

Stargazers

Watchers

Forkers

fastedit's Issues

Retrieving hyperparameters

Generating pre-update text

Applying rome to model

Retrieving hyperparameters

Recommend Projects

Recommend Topics

Recommend Org