Coder Social home page Coder Social logo

Comments (7)

hiyouga avatar hiyouga commented on August 25, 2024

model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
)
不要用 bf16

from fastedit.

tczbzb avatar tczbzb commented on August 25, 2024

不用bf16的话,llama2会报这个错误: meta-llama/llama#380

不过就算我直接load 32-bit的model,也会出现上面写的错误:

model = AutoModelForCausalLM.from_pretrained(model_path)

RuntimeError: expected scalar type Float but found Half

from fastedit.

hiyouga avatar hiyouga commented on August 25, 2024

LLaMA2 的溢出问题确实没解决,之后的版本会修复该问题,目前无法直接使用

from fastedit.

tczbzb avatar tczbzb commented on August 25, 2024

明白。目前我能否自己改rome_main.py里对应的报错行,把Half强行转化成Float来跳过这个错误?还是说这样改之后还会有别的问题?

from fastedit.

hiyouga avatar hiyouga commented on August 25, 2024

最好等待我们修复

from fastedit.

tczbzb avatar tczbzb commented on August 25, 2024

多谢多谢!再加个信息, 如果是没有用 .bfloat16(),比如以下:

model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
)

那么虽然执行会通过,但是里面的probability就都是nan了,然后inference时候就会出错。

Computing right vector (v)
Lookup index found: -37 | Sentence: A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?IV fluids and furosemide | Token: lung
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
loss nan = nan + 0.0 avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
Delta norm: nan
Change in target norm: 4.391 to nan => nan
Division Factor: 3.689
Right vector norm: nan
Right vector shape: torch.Size([4096])
Deltas successfully computed for ['model.layers.5.mlp.down_proj.weight']
Time elapsed: 12.56 seconds
New weights successfully inserted into ['model.layers.5.mlp.down_proj.weight']

RuntimeError: probability tensor contains either inf, nan or element < 0

from fastedit.

hiyouga avatar hiyouga commented on August 25, 2024

忘记说了,不采用别的数据类型,直接使用 tokenizer.pad_token = tokenizer.unk_token 也可以避免上述问题

from fastedit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.