Coder Social home page Coder Social logo

kv cache breaks generation about x-transformers HOT 5 CLOSED

ad8e avatar ad8e commented on May 20, 2024
kv cache breaks generation

from x-transformers.

Comments (5)

ad8e avatar ad8e commented on May 20, 2024 1

Thanks, this fixes the issue and is better than what I would have PR'd. Sorry that I dumped two bad testcases and then went to sleep.

@ad8e as for your following offer, it is ok

To clarify, do you mean it is ok to do it, or "it is ok" as in it is not necessary?

have a great new years Kevin

You too!

from x-transformers.

lucidrains avatar lucidrains commented on May 20, 2024 1

Thanks, this fixes the issue and is better than what I would have PR'd. Sorry that I dumped two bad testcases and then went to sleep.

@ad8e as for your following offer, it is ok

To clarify, do you mean it is ok to do it, or "it is ok" as in it is not necessary?

have a great new years Kevin

You too!

it isn't necessary, not for this lib

from x-transformers.

lucidrains avatar lucidrains commented on May 20, 2024

@ad8e hey Kevin

thanks for reporting

i quickly checked on a test script and it seems to be fine

import torch

from x_transformers import (
    TransformerWrapper,
    Decoder,
    AutoregressiveWrapper
)

model = TransformerWrapper(
    num_tokens = 20000,
    max_seq_len = 1024,
    attn_layers = Decoder(
        dim = 8,
        depth = 1,
        heads = 4
    )
)

model = AutoregressiveWrapper(model)

prompts = torch.zeros((1, 1))

generated = model.generate(
    prompts,
    seq_len = 100,
    temperature = 0.,
    cache_kv = False
)

kv_cache_generated = model.generate(
    prompts,
    seq_len = 100,
    temperature = 0.,
    cache_kv = True
)

assert torch.allclose(generated, kv_cache_generated)

could you modify the script so that it breaks? perhaps you are using some hyperparameter that is incompatible with kv cache (would be good to put in a patch if so into can_cache_kv logic)

logits and logits2 in your code have different shapes. you need to compare logits and logits2[:, -1:]

from x-transformers.

lucidrains avatar lucidrains commented on May 20, 2024

@ad8e as for your following offer, it is ok, as the library is model architecture specific. what you mention is all training related

from x-transformers.

lucidrains avatar lucidrains commented on May 20, 2024

@ad8e ah, got to the bottom of it Kevin

so it turns out the default (absolute positional embedding) is not kv cache friendly once you exceed the maximum sequence length (context window). however, it should still work when decoding from 1st token to the max context window size

i added an assert to prevent this, but also defaulted the enwik8 training script to use rotary positions, which is the preferred positional embeddings these days (llama), and kv cache friendly when exceeding context length.

ok, back to the holidays; have a great new years Kevin

from x-transformers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.