Dear, how to use prev_state in apply_ssm function since I see it is now purely forward

I see, so: <div class="snippet-clipboard-content notranslate position-relative ove

How to carry state in apply_ssm? about s5-pytorch HOT 14 CLOSED

i404788 commented on June 26, 2024

How to carry state in apply_ssm?

from s5-pytorch.

Comments (14)

looper99 commented on June 26, 2024 1

Yes, of course, returning result, xs[-1].
Thank you once again for fast replies and resolving my problem.

from s5-pytorch.

i404788 commented on June 26, 2024

Hey,

apply_ssm is the parallel formulation of S5 if you want to carry state you should use forward_rnn

s5-pytorch/s5/s5_model.py

Line 227 in f0fb132

    
           def forward_rnn(self, signal, prev_state, step_scale: float | torch.Tensor = 1.0):

with initial_state

s5-pytorch/s5/s5_model.py

Line 313 in f0fb132

def initial_state(self, batch_size: Optional[int] = None):

as prev_state.

This code path hasn't been tested too much though so there might be bugs (let me know).

For training you usually want to use the parallel formulation which increases the speed of training and reduces memory (if using regular autograd), and use forward_rnn for inference speed/memory.

from s5-pytorch.

i404788 commented on June 26, 2024

If you want to mix them I think you'll need to extract the last element of xs from apply_ssm:

- return torch.vmap(lambda x: (C_tilde @ x).real)(xs) + Du
+ return torch.vmap(lambda x: (C_tilde @ x).real)(xs) + Du, xs[-1]

from s5-pytorch.

looper99 commented on June 26, 2024

I understand, but what I meant by this question is if I now return xs[-1] how can it be reused in apply_ssm because I tried:
Lambda_bars = Lambda_bars * prev_state (where prev_state is returned xs[-1])
_, xs = associative_scan(binary_operator, (Lambda_bars, Bu_elements))
But this didn't work.

What I am talking about is a functionality as in S4's state forwarding mode:
https://github.com/HazyResearch/state-spaces/tree/main/models/s4#state-forwarding

from s5-pytorch.

i404788 commented on June 26, 2024

Ah I see, referencing the paper it seems like you would need to inject it as the initial state of associative_scan however the jax impl I ported doesn't actually support that.

I think Lambda_bars[0] = Lambda_bars[0] * prev_state w/ prev_state = xs[-1] should do the equivalent.
Note this would be after it has been tiled (just before the first associative_scan call)

from s5-pytorch.

looper99 commented on June 26, 2024

I see, so:

if Lambda_bars.ndim == 1: # Repeat for associative_scan
        Lambda_bars = Lambda_bars.tile(input_sequence.shape[0], 1)

Lambda_bars[0] = Lambda_bars[0] * prev_state
_, xs = associative_scan(binary_operator, (Lambda_bars, Bu_elements))

should do it?

Then I can have apply_ssm with prev_state:
apply_ssm(Lambda_bars: torch.Tensor, B_bars, C_tilde, D, input_sequence, prev_state, bidir: bool = False)
and carry it on:
LOOP:
x, states = s5(x, states)
END:

from s5-pytorch.

i404788 commented on June 26, 2024

Yes, with the other return patch, that should be correct. I'll probably add this functionality to the main repo after it's validated.

from s5-pytorch.

How to carry state in apply_ssm? about s5-pytorch HOT 14 CLOSED

Comments (14)

Related Issues (3)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent