Coder Social home page Coder Social logo

hiyouga / fastedit Goto Github PK

View Code? Open in Web Editor NEW
1.3K 14.0 87.0 49 KB

๐ŸฉนEditing large language models within 10 secondsโšก

License: Apache License 2.0

Python 100.00%
llms chatgpt gpt llama transformers large-language-models chatbots bloom falcon pytorch

fastedit's Issues

Error occurs when editing Baichuan-13B

loss 3.28 = 3.28 + 0.0 avg prob of [Rishi Sunak] 0.0498
loss nan = nan + nan avg prob of [Rishi Sunak] nan
loss nan = nan + nan avg prob of [Rishi Sunak] nan
loss nan = nan + nan avg prob of [Rishi Sunak] nan

The gradient of delta weight becomes nan after the first backward operation.

By using:

with torch.autograd.detect_anomaly():
     loss.backward()

We caught a runtime error by the script.

RuntimeError: Function 'MmBackward0' returned nan values in its 0th output.

I suppose that it may be related to the alibi attention masks of Baichuan-13B.

qwen support

Is there currently a plan to launch support for the qwen model?

ๅฐ่ฏ•ไบ†ไธ‹baichuan13b๏ผŒไผผไนŽๆ•ˆๆžœไธๆ˜ฏๅพˆๅฅฝ

่ฟ™็งๆ–นๆณ•ๅฅฝๅƒๅ’Œprompt็š„็ป“ๆžœๆœ‰็‚น็ฑปไผผ๏ผŒๅฐฑๆ˜ฏ่พ“ๅ‡บๆ˜ฏไธๅคช็จณๅฎš๏ผŒไธๆ˜ฏไธ€ๅฎšไผš่พ“ๅ‡บๆญฃ็กฎ็ญ”ๆกˆใ€‚ๆˆ‘ไฝฟ็”จไบ†example่‡ชๅธฆ็š„ไพ‹ๅญๅŽป่ท‘๏ผŒๅ‘็Žฐ็ป“ๆžœๅพˆไธ็จณๅฎšใ€‚
image
image

้”™่ฏฏ ๏ผšTypeError: can't convert cuda:0 device type tensor to numpy.

ๆ‰ง่กŒๅ‘ฝไปค๏ผš

python -m fastedit.editor \
    --data data/example.json \
    --model ../internlm-chat-7b \
    --config llama-7b \
    --template intern

่พ“ๅ‡บ

Loading checkpoint shards: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 2/2 [00:08<00:00,  4.37s/it]

################################
#                              #
#  Retrieving hyperparameters  #
#                              #
################################
ROMEHyperParams(layers=[5], fact_token='subject_last', v_num_grad_steps=20, v_lr=0.1, v_loss_layer=31, v_weight_decay=0.001, clamp_norm_factor=4, kl_factor=0.0625, mom2_adjustment=False, rewrite_module_tmp='model.layers.{}.mlp.down_proj', layer_module_tmp='model.layers.{}', mlp_module_tmp='model.layers.{}.mlp', attn_module_tmp='model.layers.{}.self_attn', ln_f_module='model.norm', lm_head_module='lm_head', mom2_dataset='wikipedia', mom2_n_samples=100000, mom2_dtype='float16')

################################
#                              #
#  Generating pre-update text  #
#                              #
################################
The prime minister of the United Kingdom is David Cameron<eoa>


The name of prime minister of the UK is The current prime minister of the UK is Boris Johnson.<eoa>


ๆ—ฅๆœฌ็š„้ฆ–็›ธๅซไฝœ ๅฎ‰ๅ€ๆ™‹ไธ‰<eoa>


ๆ—ฅๆœฌ้ฆ–็›ธๅๅญ—ๆ˜ฏ ๅฒธ็”ฐๆ–‡้›„<eoa>


############################
#                          #
#  Applying rome to model  #
#                          #
############################
Executing ROME algorithm for the update: [The prime minister of the UK is] -> [Rishi Sunak]
Computing left vector (u)...
Selected u projection object UK
Left vector shape: torch.Size([11008])
Computing right vector (v)
Lookup index found: -6 | Sentence: The prime minister of the UK isRishi Sunak | Token:  UK
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
loss 5.91 = 5.91 + 0.0 avg prob of [Rishi Sunak] 0.016
loss 3.773 = 3.752 + 0.021 avg prob of [Rishi Sunak] 0.0514
loss 2.498 = 2.473 + 0.025 avg prob of [Rishi Sunak] 0.1038
loss 1.481 = 1.454 + 0.027 avg prob of [Rishi Sunak] 0.2539
loss 0.769 = 0.738 + 0.031 avg prob of [Rishi Sunak] 0.4997
loss 0.273 = 0.235 + 0.037 avg prob of [Rishi Sunak] 0.804
loss 0.083 = 0.039 + 0.043 avg prob of [Rishi Sunak] 0.9628
loss 0.054 = 0.01 + 0.044 avg prob of [Rishi Sunak] 0.9896
loss 0.05 = 0.005 + 0.045 avg prob of [Rishi Sunak] 0.9952
loss 0.05 = 0.004 + 0.047 avg prob of [Rishi Sunak] 0.9965
loss 0.05 = 0.003 + 0.047 avg prob of [Rishi Sunak] 0.9971
loss 0.049 = 0.003 + 0.047 avg prob of [Rishi Sunak] 0.9974
loss 0.048 = 0.002 + 0.046 avg prob of [Rishi Sunak] 0.9977
loss 0.049 = 0.002 + 0.047 avg prob of [Rishi Sunak] 0.9978
loss 0.048 = 0.002 + 0.046 avg prob of [Rishi Sunak] 0.9979
loss 0.046 = 0.002 + 0.044 avg prob of [Rishi Sunak] 0.9979
loss 0.045 = 0.002 + 0.043 avg prob of [Rishi Sunak] 0.998
loss 0.043 = 0.002 + 0.041 avg prob of [Rishi Sunak] 0.9982
loss 0.04 = 0.002 + 0.038 avg prob of [Rishi Sunak] 0.9982
loss 0.037 = 0.002 + 0.035 avg prob of [Rishi Sunak] 0.9983
Delta norm: 34.503
Change in target norm: 9.031 to 35.53 => 26.499
Division Factor: 4.312
Traceback (most recent call last):
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 59, in _wrapfunc
    return bound(*args, **kwds)
TypeError: round() received an invalid combination of arguments - got (out=NoneType, decimals=int, ), but expected one of:
 * ()
 * (*, int decimals)
      didn't match because some of the keywords were incorrect: out


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/raid/Chris_yuzhang/FastEdit/fastedit/editor.py", line 71, in <module>
    fire.Fire(test_rome)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/raid/Chris_yuzhang/FastEdit/fastedit/editor.py", line 52, in test_rome
    model_new, _ = apply_rome_to_model(
  File "/raid/Chris_yuzhang/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
    deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
  File "/raid/Chris_yuzhang/FastEdit/fastedit/rome/rome_main.py", line 118, in execute_rome
    right_vector: torch.Tensor = compute_v(
  File "/raid/Chris_yuzhang/FastEdit/fastedit/rome/compute_v.py", line 161, in compute_v
    print(f"Right vector norm: {np.round(right_vector.norm(), 3)}")
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 3360, in round
    return _wrapfunc(a, 'round', decimals=decimals, out=out)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 68, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 45, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
  File "/var/chris/anaconda3/envs/fastedit/lib/python3.10/site-packages/torch/_tensor.py", line 970, in __array__
    return self.numpy()
**TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.**

ๆ•ฐๆฎ้›†ๆ ผๅผ

ๆƒณ้—ฎไธ€ไธ‹ๆ•ฐๆฎ้›†ๆ ผๅผๅช่ƒฝๆŒ‰็…ง็ป™็š„example้‡Œ้ข็š„้‚ฃๆ ทๅ—๏ผŸ

Llama-2-7b-chat - RuntimeError: Inference tensors cannot be saved for backward

The model is: Llama-2-7b-chat (https://huggingface.co/meta-llama/Llama-2-7b-chat)

{'prompt': 'A patient diagnosed with carcinoma of {} presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?', 'subject': 'lung', 'target': 'IV fluids and furosemide', 'queries': []}

Executing ROME algorithm for the update: [A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?] -> [IV fluids and furosemide]
Computing left vector (u)...
Selected u projection object lung
Left vector shape: torch.Size([11008])
Computing right vector (v)
Lookup index found: -37 | Sentence: A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?IV fluids and furosemide | Token: lung
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
Traceback (most recent call last):
File "/mnt/lustre/bo/medical_llm/evaluate_model_with_multiple_datasets.py", line 300, in
File "/mnt/lustre/bo/medical_llm/edit_util.py", line 50, in edit_model
model_new, _ = apply_rome_to_model(
File "/mnt/lustre/bo/medical_llm/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
File "/mnt/lustre/bo/medical_llm/FastEdit/fastedit/rome/rome_main.py", line 118, in execute_rome
right_vector: torch.Tensor = compute_v(
File "/mnt/lustre/bo/medical_llm/FastEdit/fastedit/rome/compute_v.py", line 97, in compute_v
logits = model(**input_tok).logits
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1038, in forward
outputs = self.model(
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 925, in forward
layer_outputs = decoder_layer(
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 632, in forward
hidden_states = self.input_layernorm(hidden_states)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/bo/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 113, in forward
return self.weight * hidden_states.to(input_dtype)
RuntimeError: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.

่ฏทๆ•™ๅฆ‚ไฝ•้…็ฝฎconfig๏ผŸ

่ฏท้—ฎconfigๅ’ŒtemplateๆŒ‡ๅฎšๆœ‰่ฏดๆ˜Žๅ—๏ผŒๆˆ‘็›ดๆŽฅ็”จ่ฟ™ๆ ท็š„ๆ–นๅผๆŒ‡ๅฎšๆŠฅ้”™๏ผš
CUDA_VISIBLE_DEVICES=0 python -m fastedit.editor
--data data/example.json
--model baichuan-inc/Baichuan-7B
--config Baichuan-7B
--template default

Traceback (most recent call last):
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ec2-user/SageMaker/FastEdit/fastedit/editor.py", line 71, in
fire.Fire(test_rome)
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ec2-user/SageMaker/conda/chatglm_etuning/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ec2-user/SageMaker/FastEdit/fastedit/editor.py", line 43, in test_rome
hparams = ROMEHyperParams.from_name(config)
File "/home/ec2-user/SageMaker/FastEdit/fastedit/rome/rome_hparams.py", line 91, in from_name
raise NotImplementedError
NotImplementedError

ๆ˜พๅญ˜ๅ ็”จ

ๆƒณไบ†่งฃไธ€ไธ‹ๆ˜พๅญ˜ๅ ็”จ้—ฎ้ข˜๏ผŒๅˆฉ็”จ24Gๆ˜พๅญ˜็š„ๅกๅœจๅฏนbaichuan7b่ฟ›่กŒ็ผ–่พ‘ๆ—ถ๏ผŒๆœ‰ไบ›ๅฏไปฅ็ผ–่พ‘ๆˆๅŠŸ๏ผŒๆœ‰ไบ›ไผšๆ˜พ็คบOOM๏ผŒ็จๅพฎ้•ฟไธ€ไธขไธข็š„ๅฅๅญๅฐฑไผšOOM๏ผŒๆƒณ็Ÿฅ้“ๆ•ฐๆฎ้›†็š„ไพ‹ๅญ้•ฟๅบฆๅ’Œๆ˜พๅญ˜ๅ ็”จไน‹้—ด็š„ๅ…ณ็ณป๏ผŸ

RuntimeError: computing v Vector

Example:

[{"prompt": "{} was born in a city ", "subject": "Ada Yonath", "target": "Frankfurt",
"queries": ["The birth city of Ada Yonath was "]}]

Command:

CUDA_VISIBLE_DEVICES=0 python -m fastedit.editor --data nobel_dataset.json --model bigscience/bloom-7b1 --config bloom-7b1

Output:

################################

Retrieving hyperparameters

################################
ROMEHyperParams(layers=[5], fact_token='subject_last', v_num_grad_steps=20, v_lr=0.2, v_loss_layer=29, v_weight_decay=0.001, clamp_norm_factor=4, kl_factor=0.0625, mom2_adjustment=False, rewrite_module_tmp='transformer.h.{}.mlp.dense_4h_to_h', layer_module_tmp='transformer.h.{}', mlp_module_tmp='transformer.h.{}.mlp', attn_module_tmp='transformer.h.{}.self_attention', ln_f_module='transformer.ln_f', lm_head_module='lm_head', mom2_dataset='wikipedia', mom2_n_samples=100000, mom2_dtype='float16')

################################

Generating pre-update text

################################
The birth city of Ada Yonath was Tel Aviv, Israel. She was born in the Tel Aviv neighborhood of Neve Shalom. Her father, Yitzhak Yonath, was a professor of physics at the Technion, and her mother, Shulamit, was a teacher. She has two brothers, Yaron and Yitzhak, and two sisters, Shira and Shulamit. She has a younger sister, Yael, who is a mathematician. She has a

############################

Applying rome to model

############################
Executing ROME algorithm for the update: [Ada Yonath was born in a city ] -> [Frankfurt]
Computing left vector (u)...
Selected u projection object Ada Yonath
Left vector shape: torch.Size([16384])
Computing right vector (v)
Traceback (most recent call last):
File "/opt/conda/envs/fastedit/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/envs/fastedit/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/maxim758/FastEdit/fastedit/editor.py", line 79, in
fire.Fire(test_rome)
File "/opt/conda/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/maxim758/FastEdit/fastedit/editor.py", line 55, in test_rome
model_new, _ = apply_rome_to_model(
File "/home/maxim758/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
File "/home/maxim758/FastEdit/fastedit/rome/rome_main.py", line 118, in execute_rome
right_vector: torch.Tensor = compute_v(
File "/home/maxim758/FastEdit/fastedit/rome/compute_v.py", line 47, in compute_v
rewriting_targets[i, -target_len-1:-1] = input_tok["input_ids"][i, -target_len:].clone() # build labels
RuntimeError: The expanded size of the tensor (0) must match the existing size (18) at non-singleton dimension 0. Target sizes: [0]. Tensor sizes: [18]

ไฝฟ็”จๅœจ็บฟ้‡ๅŒ–็š„baichuan 13b chat ๆŠฅ้”™ LookupError: model.layers.5.mlp.down_proj.weight

่ฟ่กŒๅ‘ฝไปค

python -m fastedit.editor     --data data/fastedit.json     --model  ../../weights/Baichuan-13B-Chat/    --config llama-7b     --template baichuan
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /opt/conda/lib/python3.10/runpy.py:196 in _run_module_as_main                                    โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   193 โ”‚   main_globals = sys.modules["__main__"].__dict__                                        โ”‚
โ”‚   194 โ”‚   if alter_argv:                                                                         โ”‚
โ”‚   195 โ”‚   โ”‚   sys.argv[0] = mod_spec.origin                                                      โ”‚
โ”‚ โฑ 196 โ”‚   return _run_code(code, main_globals, None,                                             โ”‚
โ”‚   197 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚    "__main__", mod_spec)                                                 โ”‚
โ”‚   198                                                                                            โ”‚
โ”‚   199 def run_module(mod_name, init_globals=None,                                                โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/lib/python3.10/runpy.py:86 in _run_code                                               โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    83 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚      __loader__ = loader,                                                โ”‚
โ”‚    84 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚      __package__ = pkg_name,                                             โ”‚
โ”‚    85 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚      __spec__ = mod_spec)                                                โ”‚
โ”‚ โฑ  86 โ”‚   exec(code, run_globals)                                                                โ”‚
โ”‚    87 โ”‚   return run_globals                                                                     โ”‚
โ”‚    88                                                                                            โ”‚
โ”‚    89 def _run_module_code(code, init_globals=None,                                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/editor.py:79 in <module>                          โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   76                                                                                             โ”‚
โ”‚   77                                                                                             โ”‚
โ”‚   78 if __name__ == "__main__":                                                                  โ”‚
โ”‚ โฑ 79 โ”‚   fire.Fire(test_rome)                                                                    โ”‚
โ”‚   80                                                                                             โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/lib/python3.10/site-packages/fire/core.py:141 in Fire                                 โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   138 โ”‚   context.update(caller_globals)                                                         โ”‚
โ”‚   139 โ”‚   context.update(caller_locals)                                                          โ”‚
โ”‚   140                                                                                            โ”‚
โ”‚ โฑ 141   component_trace = _Fire(component, args, parsed_flag_args, context, name)                โ”‚
โ”‚   142                                                                                            โ”‚
โ”‚   143   if component_trace.HasError():                                                           โ”‚
โ”‚   144 โ”‚   _DisplayError(component_trace)                                                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/lib/python3.10/site-packages/fire/core.py:475 in _Fire                                โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   472 โ”‚     is_class = inspect.isclass(component)                                                โ”‚
โ”‚   473 โ”‚                                                                                          โ”‚
โ”‚   474 โ”‚     try:                                                                                 โ”‚
โ”‚ โฑ 475 โ”‚   โ”‚   component, remaining_args = _CallAndUpdateTrace(                                   โ”‚
โ”‚   476 โ”‚   โ”‚   โ”‚   component,                                                                     โ”‚
โ”‚   477 โ”‚   โ”‚   โ”‚   remaining_args,                                                                โ”‚
โ”‚   478 โ”‚   โ”‚   โ”‚   component_trace,                                                               โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/lib/python3.10/site-packages/fire/core.py:691 in _CallAndUpdateTrace                  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   688 โ”‚   loop = asyncio.get_event_loop()                                                        โ”‚
โ”‚   689 โ”‚   component = loop.run_until_complete(fn(*varargs, **kwargs))                            โ”‚
โ”‚   690   else:                                                                                    โ”‚
โ”‚ โฑ 691 โ”‚   component = fn(*varargs, **kwargs)                                                     โ”‚
โ”‚   692                                                                                            โ”‚
โ”‚   693   if treatment == 'class':                                                                 โ”‚
โ”‚   694 โ”‚   action = trace.INSTANTIATED_CLASS                                                      โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/editor.py:55 in test_rome                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   52 โ”‚   โ”‚   print("\n\n".join([queries[i] + " " + pre_update_text[i] for i in range(len(quer    โ”‚
โ”‚   53 โ”‚                                                                                           โ”‚
โ”‚   54 โ”‚   print_loud(f"Applying rome to model")                                                   โ”‚
โ”‚ โฑ 55 โ”‚   model_new, _ = apply_rome_to_model(                                                     โ”‚
โ”‚   56 โ”‚   โ”‚   model_old,                                                                          โ”‚
โ”‚   57 โ”‚   โ”‚   tokenizer,                                                                          โ”‚
โ”‚   58 โ”‚   โ”‚   requests,                                                                           โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/rome/rome_main.py:56 in apply_rome_to_model       โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    53 โ”‚   weights_diff = {}                                                                      โ”‚
โ”‚    54 โ”‚                                                                                          โ”‚
โ”‚    55 โ”‚   for request in requests:                                                               โ”‚
โ”‚ โฑ  56 โ”‚   โ”‚   deltas = execute_rome(model, tokenizer, request, hparams, batch_first)             โ”‚
โ”‚    57 โ”‚   โ”‚                                                                                      โ”‚
โ”‚    58 โ”‚   โ”‚   with torch.no_grad():                                                              โ”‚
โ”‚    59 โ”‚   โ”‚   โ”‚   for w_name, (delta_u, delta_v) in deltas.items():                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/rome/rome_main.py:97 in execute_rome              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    94 โ”‚   start_time = time.time()                                                               โ”‚
โ”‚    95 โ”‚                                                                                          โ”‚
โ”‚    96 โ”‚   # Retrieve weights that user desires to change                                         โ”‚
โ”‚ โฑ  97 โ”‚   weights = {f"{hparams.rewrite_module_tmp.format(layer)}.weight":                       โ”‚
โ”‚    98 โ”‚   โ”‚   โ”‚      nethook.get_parameter(model, f"{hparams.rewrite_module_tmp.format(layer)}   โ”‚
โ”‚    99 โ”‚   โ”‚   โ”‚      for layer in hparams.layers}                                                โ”‚
โ”‚   100                                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/rome/rome_main.py:98 in <dictcomp>                โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    95 โ”‚                                                                                          โ”‚
โ”‚    96 โ”‚   # Retrieve weights that user desires to change                                         โ”‚
โ”‚    97 โ”‚   weights = {f"{hparams.rewrite_module_tmp.format(layer)}.weight":                       โ”‚
โ”‚ โฑ  98 โ”‚   โ”‚   โ”‚      nethook.get_parameter(model, f"{hparams.rewrite_module_tmp.format(layer)}   โ”‚
โ”‚    99 โ”‚   โ”‚   โ”‚      for layer in hparams.layers}                                                โ”‚
โ”‚   100 โ”‚                                                                                          โ”‚
โ”‚   101 โ”‚   # Save old weights for future restoration                                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /workspace/llm/baichuan/code/FastEdit/fastedit/utils/nethook.py:372 in get_parameter             โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   369 โ”‚   for n, p in model.named_parameters():                                                  โ”‚
โ”‚   370 โ”‚   โ”‚   if n == name:                                                                      โ”‚
โ”‚   371 โ”‚   โ”‚   โ”‚   return p                                                                       โ”‚
โ”‚ โฑ 372 โ”‚   raise LookupError(name)                                                                โ”‚
โ”‚   373                                                                                            โ”‚
โ”‚   374                                                                                            โ”‚
โ”‚   375 def replace_module(model, name, new_module):                                               โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Why modifying down_proj in llama?

Hello,

Thank you very much for this implementation.

In the Llama implementation, I wonder why and how you choose to edit the down_proj layer instead of gate_proj or up_proj in the MLP module? Thank you very much!

Best,
Wenyue

It seems like there is a ignored value in delta calculation?

if hparams.mom2_adjustment:
u = get_inv_cov(
model,
tokenizer,
hparams.rewrite_module_tmp.format(layer),
hparams.mom2_dataset,
hparams.mom2_n_samples,
hparams.mom2_dtype
) @ u.unsqueeze(1)
u = u.squeeze()

I noticed that get_inv_cov is not implemented, and this value is correspond to this constant C in original paper:
ๅ›พ็‰‡

And for this code snippet:

right_vector = (target - cur_output) / torch.dot(cur_input, left_vector)

Calculation of ฮ› just ignore this constant.
In my experiments, this may lead to a small part of edit fail to apply.

I wonder why left get_inv_cov function unimplemented. If it is tricky, is there an alternate solution, like directly adding constants for each model into hyperparams?

Looking forward for your reply.๐Ÿ™‚

LLaMA-2-7b-chat Editing failed

CUDA_VISIBLE_DEVICES=7 python -m fastedit.editor \
    --data data/example.json \
    --model /path/to/Llama-2-7b-chat-hf \
    --config llama-7b \
    --template default

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

It will be inf in generate processing. This method is only use for pretrained model like Llama-2-7b-hf?

[Llama-2-7b-chat] RuntimeError: expected scalar type Float but found Half

็›ดๆŽฅload 32-bit็š„ Llama-2-7b-chat-hf model๏ผš
model = AutoModelForCausalLM.from_pretrained(
model_path
)
ไผšๆœ‰ไปฅไธ‹้”™่ฏฏ๏ผš

Executing ROME algorithm for the update: [A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?] -> [IV fluids and furosemide]
Computing left vector (u)...
Selected u projection object lung
Left vector shape: torch.Size([11008])
Computing right vector (v)
Lookup index found: -37 | Sentence: A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?IV fluids and furosemide | Token: lung
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
loss 3.252 = 3.252 + 0.0 avg prob of [IV fluids and furosemide] 0.0395
loss 2.999 = 2.996 + 0.003 avg prob of [IV fluids and furosemide] 0.0508
loss 2.518 = 2.51 + 0.009 avg prob of [IV fluids and furosemide] 0.0823
loss 2.148 = 2.056 + 0.092 avg prob of [IV fluids and furosemide] 0.1295
loss 1.609 = 1.539 + 0.07 avg prob of [IV fluids and furosemide] 0.2176
loss 1.005 = 0.935 + 0.07 avg prob of [IV fluids and furosemide] 0.395
loss 0.443 = 0.349 + 0.094 avg prob of [IV fluids and furosemide] 0.7071
loss 0.168 = 0.09 + 0.079 avg prob of [IV fluids and furosemide] 0.9143
loss 0.059 = 0.025 + 0.034 avg prob of [IV fluids and furosemide] 0.9755
loss 0.055 = 0.019 + 0.036 avg prob of [IV fluids and furosemide] 0.9812
loss 0.042 = 0.008 + 0.035 avg prob of [IV fluids and furosemide] 0.9923
loss 0.037 = 0.005 + 0.032 avg prob of [IV fluids and furosemide] 0.9954
loss 0.035 = 0.004 + 0.031 avg prob of [IV fluids and furosemide] 0.9957
loss 0.032 = 0.004 + 0.028 avg prob of [IV fluids and furosemide] 0.9963
loss 0.029 = 0.003 + 0.026 avg prob of [IV fluids and furosemide] 0.9969
loss 0.026 = 0.003 + 0.023 avg prob of [IV fluids and furosemide] 0.9973
loss 0.023 = 0.002 + 0.02 avg prob of [IV fluids and furosemide] 0.9976
loss 0.02 = 0.002 + 0.018 avg prob of [IV fluids and furosemide] 0.9979
loss 0.019 = 0.002 + 0.017 avg prob of [IV fluids and furosemide] 0.998
loss 0.017 = 0.002 + 0.015 avg prob of [IV fluids and furosemide] 0.9982
Delta norm: 17.499
Change in target norm: 4.375 to 18.048 => 13.673
Division Factor: 3.688
Right vector norm: 4.746
Right vector shape: torch.Size([4096])

Traceback (most recent call last):
File "/data/a/zhangbo/CAP_medical_LLM/evaluate_model_with_multiple_datasets.py", line 300, in
edit_model(global_model, global_tokenizer, list_of_dicts, 'llama-7b')
File "/data/a/zhangbo/CAP_medical_LLM/edit_util.py", line 50, in edit_model
model_new, _ = apply_rome_to_model(
File "/data/a/zhangbo/CAP_medical_LLM/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
File "/data/a/zhangbo/CAP_medical_LLM/FastEdit/fastedit/rome/rome_main.py", line 134, in execute_rome
upd_matrix = left_vector.unsqueeze(1) @ right_vector.unsqueeze(0)
RuntimeError: expected scalar type Float but found Half

======

ๅฆ‚ๆžœload 16-bit็š„model:
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
).bfloat16()

ไนŸไผšๆœ‰็ฑปไผผ็š„้”™่ฏฏ:
RuntimeError: expected scalar type BFloat16 but found Half

่ฟ่กŒๆ—ถๆŠฅ้”™

################################

Retrieving hyperparameters

################################
Traceback (most recent call last):
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/lyn/FastEdit/fastedit/editor.py", line 79, in
fire.Fire(test_rome)
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/lyn/miniconda3/envs/fastedit/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/lyn/FastEdit/fastedit/editor.py", line 46, in test_rome
hparams = ROMEHyperParams.from_name(config)
File "/home/lyn/FastEdit/fastedit/rome/rome_hparams.py", line 97, in from_name
raise NotImplementedError
NotImplementedError

ๆƒณ่ฏท้—ฎไธ€ไธ‹่ฟ่กŒไปฅไธ‹ๅ‘ฝไปคๆ—ถๆŠฅ้”™ๆ˜ฏๆ€Žไนˆๅ›žไบ‹ๅ•Š๏ผŒๆˆ‘ๅฐ†ๆจกๅž‹ๅทฒ็ปไธ‹่ฝฝๅˆฐไบ†ๆœฌๅœฐ๏ผš
CUDA_VISIBLE_DEVICES=0 python -m fastedit.editor
--data data/example.json
--model /home/lyn/gpt2-xl
--config gpt2-xl
--template default

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.