Coder Social home page Coder Social logo

hiyouga / fastedit Goto Github PK

View Code? Open in Web Editor NEW
1.2K 1.2K 83.0 49 KB

๐ŸฉนEditing large language models within 10 secondsโšก

License: Apache License 2.0

Python 100.00%
bloom chatbots chatgpt falcon gpt large-language-models llama llms pytorch transformers

fastedit's Introduction

stat

hiyouga

Yaowei Zheng

Ph.D. Student

Beihang University

37 Xueyuan Rd., Haidian Dist.

Beijing, China, 100191

Education

  • 2022.09-Present School of Computer Science and Engineering, Beihang University Ph.D.
  • 2017.09-2021.06 Shen Yuan Honors College, Beihang University B.Eng.

Research Interests

  • Natural Language Processing
  • Large Language Models

Skills

  • Natural Language: Chinese (Native); English (CET-6); Japanese (JLPT-N2)
  • Programming Language: Python; C++; Java; JavaScript; PHP; Go; Verilog HDL; MATLAB
  • Typesetting Language: LaTeX; Markdown
  • Programming Framework: PyTorch; TensorFlow
  1. Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo and Yongqiang Ma: LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models. ACL2024. [arXiv]
  2. Junfan Chen, Richong Zhang, Yaowei Zheng, Qianben Chen, Chunming Hu and Yongyi Mao: DualCL: Principled Supervised Contrastive Learning as Mutual Information Maximization for Text Classification. WWW2024. [DOI][arXiv][Code]
  3. Richong Zhang, Qianben Chen, Yaowei Zheng, Samuel Mensah and Yongyi Mao: Aspect-level Sentiment Analysis via a Syntax-based Neural Network. IEEE/ACM Transactions on Audio, Speech, and Language Processing. [DOI]
  4. Xiaohui Guo, Richong Zhang, Yaowei Zheng and Yongyi Mao: Robust Regularization with Adversarial Labelling of Perturbed Samples. IJCAI2021. [DOI][arXiv]
  5. Yaowei Zheng, Richong Zhang and Yongyi Mao: Regularizing Neural Networks via Adversarial Model Perturbation. CVPR2021. [DOI][arXiv][Code][Poster][Video]
  6. Yaowei Zheng, Richong Zhang, Suyuchen Wang, Samuel Mensah and Yongyi Mao: Anchored Model Transfer and Soft Instance Transfer for Cross-Task Cross-Domain Learning: A Study Through Aspect-Level Sentiment Classification. WWW2020. [DOI]
  7. Yaowei Zheng, Richong Zhang, Samuel Mensah and Yongyi Mao: Replicate, Walk, and Stop on Syntax: an Effective Neural Network Model for Aspect-Level Sentiment Classification. AAAI2020. [DOI][Code]

Academic Service

  • Conference Reviewer: AAAI, EMNLP, NAACL, COLING
  • Journal Reviewer: Neural Computation

fastedit's People

Contributors

buaadreamer avatar hiyouga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastedit's Issues

ๆ˜พๅญ˜ๅ ็”จ

ๆƒณไบ†่งฃไธ€ไธ‹ๆ˜พๅญ˜ๅ ็”จ้—ฎ้ข˜๏ผŒๅˆฉ็”จ24Gๆ˜พๅญ˜็š„ๅกๅœจๅฏนbaichuan7b่ฟ›่กŒ็ผ–่พ‘ๆ—ถ๏ผŒๆœ‰ไบ›ๅฏไปฅ็ผ–่พ‘ๆˆๅŠŸ๏ผŒๆœ‰ไบ›ไผšๆ˜พ็คบOOM๏ผŒ็จๅพฎ้•ฟไธ€ไธขไธข็š„ๅฅๅญๅฐฑไผšOOM๏ผŒๆƒณ็Ÿฅ้“ๆ•ฐๆฎ้›†็š„ไพ‹ๅญ้•ฟๅบฆๅ’Œๆ˜พๅญ˜ๅ ็”จไน‹้—ด็š„ๅ…ณ็ณป๏ผŸ

้”™่ฏฏ ๏ผšTypeError: can't convert cuda:0 device type tensor to numpy.

ๆ‰ง่กŒๅ‘ฝไปค๏ผš

python -m fastedit.editor \
    --data data/example.json \
    --model ../internlm-chat-7b \
    --config llama-7b \
    --template intern

่พ“ๅ‡บ

Loading checkpoint shards: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 2/2 [00:08<00:00,  4.37s/it]

################################
#                              #
#  Retrieving hyperparameters  #
#                              #
################################
ROMEHyperParams(layers=[5], fact_token='subject_last', v_num_grad_steps=20, v_lr=0.1, v_loss_layer=31, v_weight_decay=0.001, clamp_norm_factor=4, kl_factor=0.0625, mom2_adjustment=False, rewrite_module_tmp='model.layers.{}.mlp.down_proj', layer_module_tmp='model.layers.{}', mlp_module_tmp='model.layers.{}.mlp', attn_module_tmp='model.layers.{}.self_attn', ln_f_module='model.norm', lm_head_module='lm_head', mom2_dataset='wikipedia', mom2_n_samples=100000, mom2_dtype='float16')

################################
#                              #
#  Generating pre-update text  #
#                              #
################################
The prime minister of the United Kingdom is David Cameron<eoa>


The name of prime minister of the UK is The current prime minister of the UK is Boris Johnson.<eoa>


ๆ—ฅๆœฌ็š„้ฆ–็›ธๅซไฝœ ๅฎ‰ๅ€ๆ™‹ไธ‰<eoa>


ๆ—ฅๆœฌ้ฆ–็›ธๅๅญ—ๆ˜ฏ ๅฒธ็”ฐๆ–‡้›„<eoa>


############################
#                          #
#  Applying rome to model  #
#                          #
############################
Executing ROME algorithm for the update: [The prime minister of the UK is] -> [Rishi Sunak]
Computing left vector (u)...
Selected u projection object UK
Left vector shape: torch.Size([11008])
Computing right vector (v)
Lookup index found: -6 | Sentence: The prime minister of the UK isRishi Sunak | Token:  UK
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
loss 5.91 = 5.91 + 0.0 avg prob of [Rishi Sunak] 0.016
loss 3.773 = 3.752 + 0.021 avg prob of [Rishi Sunak] 0.0514
loss 2.498 = 2.473 + 0.025 avg prob of [Rishi Sunak] 0.1038
loss 1.481 = 1.454 + 0.027 avg prob of [Rishi Sunak] 0.2539
loss 0.769 = 0.738 + 0.031 avg prob of [Rishi Sunak] 0.4997
loss 0.273 = 0.235 + 0.037 avg prob of [Rishi Sunak] 0.804
loss 0.083 = 0.039 + 0.043 avg prob of [Rishi Sunak] 0.9628
loss 0.054 = 0.01 + 0.044 avg prob of [Rishi Sunak] 0.9896
loss 0.05 = 0.005 + 0.045 avg prob of [Rishi Sunak] 0.9952
loss 0.05 = 0.004 + 0.047 avg prob of [Rishi Sunak] 0.9965
loss 0.05 = 0.003 + 0.047 avg prob of [Rishi Sunak] 0.9971
loss 0.049 = 0.003 + 0.047 avg prob of [Rishi Sunak] 0.9974
loss 0.048 = 0.002 + 0.046 avg prob of [Rishi Sunak] 0.9977
loss 0.049 = 0.002 + 0.047 avg prob of [Rishi Sunak] 0.9978
loss 0.048 = 0.002 + 0.046 avg prob of [Rishi Sunak] 0.9979
loss 0.046 = 0.002 + 0.044 avg prob of [Rishi Sunak] 0.9979
loss 0.045 = 0.002 + 0.043 avg prob of [Rishi Sunak] 0.998
loss 0.043 = 0.002 + 0.041 avg prob of [Rishi Sunak] 0.9982
loss 0.04 = 0.002 + 0.038 avg prob of [Rishi Sunak] 0.9982
loss 0.037 = 0.002 + 0.035 avg prob of [Rishi Sunak] 0.9983
Delta norm: 34.503
Change in target norm: 9.031 to 35.53 => 26.499
Division Factor: 4.312
Traceback (most recent call last):
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 59, in _wrapfunc
    return bound(*args, **kwds)
TypeError: round() received an invalid combination of arguments - got (out=NoneType, decimals=int, ), but expected one of:
 * ()
 * (*, int decimals)
      didn't match because some of the keywords were incorrect: out


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/raid/Chris_yuzhang/FastEdit/fastedit/editor.py", line 71, in <module>
    fire.Fire(test_rome)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/raid/Chris_yuzhang/FastEdit/fastedit/editor.py", line 52, in test_rome
    model_new, _ = apply_rome_to_model(
  File "/raid/Chris_yuzhang/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
    deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
  File "/raid/Chris_yuzhang/FastEdit/fastedit/rome/rome_main.py", line 118, in execute_rome
    right_vector: torch.Tensor = compute_v(
  File "/raid/Chris_yuzhang/FastEdit/fastedit/rome/compute_v.py", line 161, in compute_v
    print(f"Right vector norm: {np.round(right_vector.norm(), 3)}")
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 3360, in round
    return _wrapfunc(a, 'round', decimals=decimals, out=out)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 68, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 45, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/torch/_tensor.py", line 970, in __array__
    return self.numpy()
**TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.**

[Llama-2-7b-chat] RuntimeError: expected scalar type Float but found Half

็›ดๆŽฅload 32-bit็š„ Llama-2-7b-chat-hf model๏ผš
model = AutoModelForCausalLM.from_pretrained(
model_path
)
ไผšๆœ‰ไปฅไธ‹้”™่ฏฏ๏ผš

Executing ROME algorithm for the update: [A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?] -> [IV fluids and furosemide]
Computing left vector (u)...
Selected u projection object lung
Left vector shape: torch.Size([11008])
Computing right vector (v)
Lookup index found: -37 | Sentence: A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?IV fluids and furosemide | Token: lung
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
loss 3.252 = 3.252 + 0.0 avg prob of [IV fluids and furosemide] 0.0395
loss 2.999 = 2.996 + 0.003 avg prob of [IV fluids and furosemide] 0.0508
loss 2.518 = 2.51 + 0.009 avg prob of [IV fluids and furosemide] 0.0823
loss 2.148 = 2.056 + 0.092 avg prob of [IV fluids and furosemide] 0.1295
loss 1.609 = 1.539 + 0.07 avg prob of [IV fluids and furosemide] 0.2176
loss 1.005 = 0.935 + 0.07 avg prob of [IV fluids and furosemide] 0.395
loss 0.443 = 0.349 + 0.094 avg prob of [IV fluids and furosemide] 0.7071
loss 0.168 = 0.09 + 0.079 avg prob of [IV fluids and furosemide] 0.9143
loss 0.059 = 0.025 + 0.034 avg prob of [IV fluids and furosemide] 0.9755
loss 0.055 = 0.019 + 0.036 avg prob of [IV fluids and furosemide] 0.9812
loss 0.042 = 0.008 + 0.035 avg prob of [IV fluids and furosemide] 0.9923
loss 0.037 = 0.005 + 0.032 avg prob of [IV fluids and furosemide] 0.9954
loss 0.035 = 0.004 + 0.031 avg prob of [IV fluids and furosemide] 0.9957
loss 0.032 = 0.004 + 0.028 avg prob of [IV fluids and furosemide] 0.9963
loss 0.029 = 0.003 + 0.026 avg prob of [IV fluids and furosemide] 0.9969
loss 0.026 = 0.003 + 0.023 avg prob of [IV fluids and furosemide] 0.9973
loss 0.023 = 0.002 + 0.02 avg prob of [IV fluids and furosemide] 0.9976
loss 0.02 = 0.002 + 0.018 avg prob of [IV fluids and furosemide] 0.9979
loss 0.019 = 0.002 + 0.017 avg prob of [IV fluids and furosemide] 0.998
loss 0.017 = 0.002 + 0.015 avg prob of [IV fluids and furosemide] 0.9982
Delta norm: 17.499
Change in target norm: 4.375 to 18.048 => 13.673
Division Factor: 3.688
Right vector norm: 4.746
Right vector shape: torch.Size([4096])

Traceback (most recent call last):
File "/data/a/zhangbo/CAP_medical_LLM/evaluate_model_with_multiple_datasets.py", line 300, in
edit_model(global_model, global_tokenizer, list_of_dicts, 'llama-7b')
File "/data/a/zhangbo/CAP_medical_LLM/edit_util.py", line 50, in edit_model
model_new, _ = apply_rome_to_model(
File "/data/a/zhangbo/CAP_medical_LLM/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
File "/data/a/zhangbo/CAP_medical_LLM/FastEdit/fastedit/rome/rome_main.py", line 134, in execute_rome
upd_matrix = left_vector.unsqueeze(1) @ right_vector.unsqueeze(0)
RuntimeError: expected scalar type Float but found Half

======

ๅฆ‚ๆžœload 16-bit็š„model:
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
).bfloat16()

ไนŸไผšๆœ‰็ฑปไผผ็š„้”™่ฏฏ:
RuntimeError: expected scalar type BFloat16 but found Half

ๆ•ฐๆฎ้›†ๆ ผๅผ

ๆƒณ้—ฎไธ€ไธ‹ๆ•ฐๆฎ้›†ๆ ผๅผๅช่ƒฝๆŒ‰็…ง็ป™็š„example้‡Œ้ข็š„้‚ฃๆ ทๅ—๏ผŸ

RuntimeError: computing v Vector

Example:

[{"prompt": "{} was born in a city ", "subject": "Ada Yonath", "target": "Frankfurt",
"queries": ["The birth city of Ada Yonath was "]}]

Command:

CUDA_VISIBLE_DEVICES=0 python -m fastedit.editor --data nobel_dataset.json --model bigscience/bloom-7b1 --config bloom-7b1

Output:

################################

Retrieving hyperparameters

################################
ROMEHyperParams(layers=[5], fact_token='subject_last', v_num_grad_steps=20, v_lr=0.2, v_loss_layer=29, v_weight_decay=0.001, clamp_norm_factor=4, kl_factor=0.0625, mom2_adjustment=False, rewrite_module_tmp='transformer.h.{}.mlp.dense_4h_to_h', layer_module_tmp='transformer.h.{}', mlp_module_tmp='transformer.h.{}.mlp', attn_module_tmp='transformer.h.{}.self_attention', ln_f_module='transformer.ln_f', lm_head_module='lm_head', mom2_dataset='wikipedia', mom2_n_samples=100000, mom2_dtype='float16')

################################

Generating pre-update text

################################
The birth city of Ada Yonath was Tel Aviv, Israel. She was born in the Tel Aviv neighborhood of Neve Shalom. Her father, Yitzhak Yonath, was a professor of physics at the Technion, and her mother, Shulamit, was a teacher. She has two brothers, Yaron and Yitzhak, and two sisters, Shira and Shulamit. She has a younger sister, Yael, who is a mathematician. She has a

############################

Applying rome to model

############################
Executing ROME algorithm for the update: [Ada Yonath was born in a city ] -> [Frankfurt]
Computing left vector (u)...
Selected u projection object Ada Yonath
Left vector shape: torch.Size([16384])
Computing right vector (v)
Traceback (most recent call last):
File "/opt/conda/envs/fastedit/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/envs/fastedit/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/maxim758/FastEdit/fastedit/editor.py", line 79, in
fire.Fire(test_rome)
File "/opt/conda/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/maxim758/FastEdit/fastedit/editor.py", line 55, in test_rome
model_new, _ = apply_rome_to_model(
File "/home/maxim758/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
File "/home/maxim758/FastEdit/fastedit/rome/rome_main.py", line 118, in execute_rome
right_vector: torch.Tensor = compute_v(
File "/home/maxim758/FastEdit/fastedit/rome/compute_v.py", line 47, in compute_v
rewriting_targets[i, -target_len-1:-1] = input_tok["input_ids"][i, -target_len:].clone() # build labels
RuntimeError: The expanded size of the tensor (0) must match the existing size (18) at non-singleton dimension 0. Target sizes: [0]. Tensor sizes: [18]

Why modifying down_proj in llama?

Hello,

Thank you very much for this implementation.

In the Llama implementation, I wonder why and how you choose to edit the down_proj layer instead of gate_proj or up_proj in the MLP module? Thank you very much!

Best,
Wenyue

่ฟ่กŒๆ—ถๆŠฅ้”™

################################

Retrieving hyperparameters

################################
Traceback (most recent call last):
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/lyn/FastEdit/fastedit/editor.py", line 79, in
fire.Fire(test_rome)
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/lyn/FastEdit/fastedit/editor.py", line 46, in test_rome
hparams = ROMEHyperParams.from_name(config)
File "/home/lyn/FastEdit/fastedit/rome/rome_hparams.py", line 97, in from_name
raise NotImplementedError
NotImplementedError

ๆƒณ่ฏท้—ฎไธ€ไธ‹่ฟ่กŒไปฅไธ‹ๅ‘ฝไปคๆ—ถๆŠฅ้”™ๆ˜ฏๆ€Žไนˆๅ›žไบ‹ๅ•Š๏ผŒๆˆ‘ๅฐ†ๆจกๅž‹ๅทฒ็ปไธ‹่ฝฝๅˆฐไบ†ๆœฌๅœฐ๏ผš
CUDA_VISIBLE_DEVICES=0 python -m fastedit.editor
--data data/example.json
--model /home/lyn/gpt2-xl
--config gpt2-xl
--template default

่ฏทๆ•™ๅฆ‚ไฝ•้…็ฝฎconfig๏ผŸ

่ฏท้—ฎconfigๅ’ŒtemplateๆŒ‡ๅฎšๆœ‰่ฏดๆ˜Žๅ—๏ผŒๆˆ‘็›ดๆŽฅ็”จ่ฟ™ๆ ท็š„ๆ–นๅผๆŒ‡ๅฎšๆŠฅ้”™๏ผš
CUDA_VISIBLE_DEVICES=0 python -m fastedit.editor
--data data/example.json
--model baichuan-inc/Baichuan-7B
--config Baichuan-7B
--template default

Traceback (most recent call last):
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ec2-user/SageMaker/FastEdit/fastedit/editor.py", line 71, in
fire.Fire(test_rome)
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ec2-user/SageMaker/FastEdit/fastedit/editor.py", line 43, in test_rome
hparams = ROMEHyperParams.from_name(config)
File "/home/ec2-user/SageMaker/FastEdit/fastedit/rome/rome_hparams.py", line 91, in from_name
raise NotImplementedError
NotImplementedError

ๅฐ่ฏ•ไบ†ไธ‹baichuan13b๏ผŒไผผไนŽๆ•ˆๆžœไธๆ˜ฏๅพˆๅฅฝ

่ฟ™็งๆ–นๆณ•ๅฅฝๅƒๅ’Œprompt็š„็ป“ๆžœๆœ‰็‚น็ฑปไผผ๏ผŒๅฐฑๆ˜ฏ่พ“ๅ‡บๆ˜ฏไธๅคช็จณๅฎš๏ผŒไธๆ˜ฏไธ€ๅฎšไผš่พ“ๅ‡บๆญฃ็กฎ็ญ”ๆกˆใ€‚ๆˆ‘ไฝฟ็”จไบ†example่‡ชๅธฆ็š„ไพ‹ๅญๅŽป่ท‘๏ผŒๅ‘็Žฐ็ป“ๆžœๅพˆไธ็จณๅฎšใ€‚
image
image

ไฝฟ็”จๅœจ็บฟ้‡ๅŒ–็š„baichuan 13b chat ๆŠฅ้”™ LookupError: model.layers.5.mlp.down_proj.weight

่ฟ่กŒๅ‘ฝไปค

python -m fastedit.editor     --data data/fastedit.json     --model  ../../weights/Baichuan-13B-Chat/    --config llama-7b     --template baichuan
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /opt/conda/lib/python3.10/runpy.py:196 in _run_module_as_main                                    โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   193 โ”‚   main_globals = sys.modules["__main__"].__dict__                                        โ”‚
โ”‚   194 โ”‚   if alter_argv:                                                                         โ”‚
โ”‚   195 โ”‚   โ”‚   sys.argv[0] = mod_spec.origin                                                      โ”‚
โ”‚ โฑ 196 โ”‚   return _run_code(code, main_globals, None,                                             โ”‚
โ”‚   197 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚    "__main__", mod_spec)                                                 โ”‚
โ”‚   198                                                                                            โ”‚
โ”‚   199 def run_module(mod_name, init_globals=None,                                                โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/lib/python3.10/runpy.py:86 in _run_code                                               โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    83 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚      __loader__ = loader,                                                โ”‚
โ”‚    84 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚      __package__ = pkg_name,                                             โ”‚
โ”‚    85 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚      __spec__ = mod_spec)                                                โ”‚
โ”‚ โฑ  86 โ”‚   exec(code, run_globals)                                                                โ”‚
โ”‚    87 โ”‚   return run_globals                                                                     โ”‚
โ”‚    88                                                                                            โ”‚
โ”‚    89 def _run_module_code(code, init_globals=None,                                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/editor.py:79 in <module>                          โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   76                                                                                             โ”‚
โ”‚   77                                                                                             โ”‚
โ”‚   78 if __name__ == "__main__":                                                                  โ”‚
โ”‚ โฑ 79 โ”‚   fire.Fire(test_rome)                                                                    โ”‚
โ”‚   80                                                                                             โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/lib/python3.10/site-packages/fire/core.py:141 in Fire                                 โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   138 โ”‚   context.update(caller_globals)                                                         โ”‚
โ”‚   139 โ”‚   context.update(caller_locals)                                                          โ”‚
โ”‚   140                                                                                            โ”‚
โ”‚ โฑ 141   component_trace = _Fire(component, args, parsed_flag_args, context, name)                โ”‚
โ”‚   142                                                                                            โ”‚
โ”‚   143   if component_trace.HasError():                                                           โ”‚
โ”‚   144 โ”‚   _DisplayError(component_trace)                                                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/lib/python3.10/site-packages/fire/core.py:475 in _Fire                                โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   472 โ”‚     is_class = inspect.isclass(component)                                                โ”‚
โ”‚   473 โ”‚                                                                                          โ”‚
โ”‚   474 โ”‚     try:                                                                                 โ”‚
โ”‚ โฑ 475 โ”‚   โ”‚   component, remaining_args = _CallAndUpdateTrace(                                   โ”‚
โ”‚   476 โ”‚   โ”‚   โ”‚   component,                                                                     โ”‚
โ”‚   477 โ”‚   โ”‚   โ”‚   remaining_args,                                                                โ”‚
โ”‚   478 โ”‚   โ”‚   โ”‚   component_trace,                                                               โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/lib/python3.10/site-packages/fire/core.py:691 in _CallAndUpdateTrace                  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   688 โ”‚   loop = asyncio.get_event_loop()                                                        โ”‚
โ”‚   689 โ”‚   component = loop.run_until_complete(fn(*varargs, **kwargs))                            โ”‚
โ”‚   690   else:                                                                                    โ”‚
โ”‚ โฑ 691 โ”‚   component = fn(*varargs, **kwargs)                                                     โ”‚
โ”‚   692                                                                                            โ”‚
โ”‚   693   if treatment == 'class':                                                                 โ”‚
โ”‚   694 โ”‚   action = trace.INSTANTIATED_CLASS                                                      โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/editor.py:55 in test_rome                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   52 โ”‚   โ”‚   print("\n\n".join([queries[i] + " " + pre_update_text[i] for i in range(len(quer    โ”‚
โ”‚   53 โ”‚                                                                                           โ”‚
โ”‚   54 โ”‚   print_loud(f"Applying rome to model")                                                   โ”‚
โ”‚ โฑ 55 โ”‚   model_new, _ = apply_rome_to_model(                                                     โ”‚
โ”‚   56 โ”‚   โ”‚   model_old,                                                                          โ”‚
โ”‚   57 โ”‚   โ”‚   tokenizer,                                                                          โ”‚
โ”‚   58 โ”‚   โ”‚   requests,                                                                           โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/rome/rome_main.py:56 in apply_rome_to_model       โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    53 โ”‚   weights_diff = {}                                                                      โ”‚
โ”‚    54 โ”‚                                                                                          โ”‚
โ”‚    55 โ”‚   for request in requests:                                                               โ”‚
โ”‚ โฑ  56 โ”‚   โ”‚   deltas = execute_rome(model, tokenizer, request, hparams, batch_first)             โ”‚
โ”‚    57 โ”‚   โ”‚                                                                                      โ”‚
โ”‚    58 โ”‚   โ”‚   with torch.no_grad():                                                              โ”‚
โ”‚    59 โ”‚   โ”‚   โ”‚   for w_name, (delta_u, delta_v) in deltas.items():                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/rome/rome_main.py:97 in execute_rome              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    94 โ”‚   start_time = time.time()                                                               โ”‚
โ”‚    95 โ”‚                                                                                          โ”‚
โ”‚    96 โ”‚   # Retrieve weights that user desires to change                                         โ”‚
โ”‚ โฑ  97 โ”‚   weights = {f"{hparams.rewrite_module_tmp.format(layer)}.weight":                       โ”‚
โ”‚    98 โ”‚   โ”‚   โ”‚      nethook.get_parameter(model, f"{hparams.rewrite_module_tmp.format(layer)}   โ”‚
โ”‚    99 โ”‚   โ”‚   โ”‚      for layer in hparams.layers}                                                โ”‚
โ”‚   100                                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/rome/rome_main.py:98 in <dictcomp>                โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    95 โ”‚                                                                                          โ”‚
โ”‚    96 โ”‚   # Retrieve weights that user desires to change                                         โ”‚
โ”‚    97 โ”‚   weights = {f"{hparams.rewrite_module_tmp.format(layer)}.weight":                       โ”‚
โ”‚ โฑ  98 โ”‚   โ”‚   โ”‚      nethook.get_parameter(model, f"{hparams.rewrite_module_tmp.format(layer)}   โ”‚
โ”‚    99 โ”‚   โ”‚   โ”‚      for layer in hparams.layers}                                                โ”‚
โ”‚   100 โ”‚                                                                                          โ”‚
โ”‚   101 โ”‚   # Save old weights for future restoration                                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/utils/nethook.py:372 in get_parameter             โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   369 โ”‚   for n, p in model.named_parameters():                                                  โ”‚
โ”‚   370 โ”‚   โ”‚   if n == name:                                                                      โ”‚
โ”‚   371 โ”‚   โ”‚   โ”‚   return p                                                                       โ”‚
โ”‚ โฑ 372 โ”‚   raise LookupError(name)                                                                โ”‚
โ”‚   373                                                                                            โ”‚
โ”‚   374                                                                                            โ”‚
โ”‚   375 def replace_module(model, name, new_module):                                               โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

It seems like there is a ignored value in delta calculation?

if hparams.mom2_adjustment:
u = get_inv_cov(
model,
tokenizer,
hparams.rewrite_module_tmp.format(layer),
hparams.mom2_dataset,
hparams.mom2_n_samples,
hparams.mom2_dtype
) @ u.unsqueeze(1)
u = u.squeeze()

I noticed that get_inv_cov is not implemented, and this value is correspond to this constant C in original paper:
ๅ›พ็‰‡

And for this code snippet:

right_vector = (target - cur_output) / torch.dot(cur_input, left_vector)

Calculation of ฮ› just ignore this constant.
In my experiments, this may lead to a small part of edit fail to apply.

I wonder why left get_inv_cov function unimplemented. If it is tricky, is there an alternate solution, like directly adding constants for each model into hyperparams?

Looking forward for your reply.๐Ÿ™‚

Llama-2-7b-chat - RuntimeError: Inference tensors cannot be saved for backward

The model is: Llama-2-7b-chat (https://huggingface.co/meta-llama/Llama-2-7b-chat)

{'prompt': 'A patient diagnosed with carcinoma of {} presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?', 'subject': 'lung', 'target': 'IV fluids and furosemide', 'queries': []}

Executing ROME algorithm for the update: [A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?] -> [IV fluids and furosemide]
Computing left vector (u)...
Selected u projection object lung
Left vector shape: torch.Size([11008])
Computing right vector (v)
Lookup index found: -37 | Sentence: A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?IV fluids and furosemide | Token: lung
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
Traceback (most recent call last):
File "/mnt/lustre/bo/medical_llm/evaluate_model_with_multiple_datasets.py", line 300, in
File "/mnt/lustre/bo/medical_llm/edit_util.py", line 50, in edit_model
model_new, _ = apply_rome_to_model(
File "/mnt/lustre/bo/medical_llm/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
File "/mnt/lustre/bo/medical_llm/FastEdit/fastedit/rome/rome_main.py", line 118, in execute_rome
right_vector: torch.Tensor = compute_v(
File "/mnt/lustre/bo/medical_llm/FastEdit/fastedit/rome/compute_v.py", line 97, in compute_v
logits = model(**input_tok).logits
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1038, in forward
outputs = self.model(
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 925, in forward
layer_outputs = decoder_layer(
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 632, in forward
hidden_states = self.input_layernorm(hidden_states)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 113, in forward
return self.weight * hidden_states.to(input_dtype)
RuntimeError: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.

qwen support

Is there currently a plan to launch support for the qwen model?

LLaMA-2-7b-chat Editing failed

CUDA_VISIBLE_DEVICES=7 python -m fastedit.editor \
    --data data/example.json \
    --model /path/to/Llama-2-7b-chat-hf \
    --config llama-7b \
    --template default

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

It will be inf in generate processing. This method is only use for pretrained model like Llama-2-7b-hf?

Error occurs when editing Baichuan-13B

loss 3.28 = 3.28 + 0.0 avg prob of [Rishi Sunak] 0.0498
loss nan = nan + nan avg prob of [Rishi Sunak] nan
loss nan = nan + nan avg prob of [Rishi Sunak] nan
loss nan = nan + nan avg prob of [Rishi Sunak] nan

The gradient of delta weight becomes nan after the first backward operation.

By using:

with torch.autograd.detect_anomaly():
     loss.backward()

We caught a runtime error by the script.

RuntimeError: Function 'MmBackward0' returned nan values in its 0th output.

I suppose that it may be related to the alibi attention masks of Baichuan-13B.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.