howuhh / faster-trajectory-transformer Goto Github PK

View Code? Open in Web Editor NEW

99.0 99.0 12.0 37.48 MB

Implementation of Trajectory Transformer with attention caching and batched beam search

License: MIT License

Python 100.00%

reinforcement-learning trajectory-transformer transformer

faster-trajectory-transformer's People

Stargazers

Watchers

Forkers

takuyahiraokanec rlboys dankhap nabayanc grahamannett mcx mohammadrezanakhaei alison-bartsch khanhptnk heng2j weiqinchen7 fema42

faster-trajectory-transformer's Issues

Jax Code

Hi,

Thank you very much for your contribution.

I would like to ask is it possible to release the code based on Jax

Best

pre-norm bug in transformer block

Pre-norm should be (as in mingpt code):
x + attn(norm(x))

In my code now:

norm_x = norm(x)
norm_x + attn(norm_x)

faster-trajectory-transformer/trajectory/models/gpt/gpt.py

Line 32 in cc1af3a

x = self.norm1(x)

While it is a small bug, I think it can have a great impact on performance and should be fixed. But it will break all pretrained checkpoints and I don't have time to retrain them for now.

Hi, I have another question regarding the training input and target.
Why is the training input x sliced from the flattened tensor? what I'm getting is that the input x is removing the value from last step of the sequence length while the target y is removing the 1st dimension of obs in the 1st step? shouldn't the training inputs be sliced along the seq_len rather than flattening everything and only taking out 1 token?
In the code below, shouldn't we not reshape and slice it.

joined_discrete = self.discretizer.encode(joined).reshape(-1).astype(np.long) loss_pad_mask = loss_pad_mask.reshape(-1) return joined_discrete[:-1], joined_discrete[1:], loss_pad_mask[:-1]

Compatibility with checkpoints from the original implementation

Hello!

I am wondering whether the code is compatible with the checkpoints shared in the original repo. I am hoping to play around with this concept, and hope that the checkpoints could be plugged into this implementation seamlessly.

Cheers!

q,k,v linear layers

Hi, can i ask why is the q,k,v used as the input itself instead of passing through a linear layer?
Comparing to the original code, this seems to be different.
Thanks!

possible issue with rewards to go?

faster-trajectory-transformer/trajectory/datasets/d4rl_dataset.py

Line 25 in 6621fc9

values[t] = (rewards[t + 1:] * discounts[:-t - 1]).sum()

Hi, I have been using this repo for some experiments and was digging into some parts of it and was wondering is this correct? These values don't seem to match up with what I see for the sample trajectory off of https://github.com/jannerm/trajectory-transformer/blob/8834a6ed04ceeab8fdb9465e145c6e041c05d71b/trajectory/datasets/sequence.py#L97

There is also a high likelihood I am wrong but the rewards to go for me seem like they are much bigger than I would expect. If so its possible this line is just supposed to be (rewards[t + 1 :].T @ discounts[: -t - 1]) (which for me then matches up with the original repo RTG values) but I am not certain if I am understanding everything correctly.

For comparing values on halfcheetah-medium-v2 the rtg on the first trajectory using your calculation:

(Pdb) values.sum()
242487360.0

while on the original repo:

(Pdb) self.values_segmented[0].sum()
432073.4913363826

Novel Dataset Preparation

Do you have any recommendations or resources you could point me to for preparing a novel dataset for use in Trajectory Transformer?

howuhh / faster-trajectory-transformer Goto Github PK

faster-trajectory-transformer's People

Stargazers

Watchers

Forkers

faster-trajectory-transformer's Issues

Jax Code

pre-norm bug in transformer block

training input and target

Compatibility with checkpoints from the original implementation

q,k,v linear layers

possible issue with rewards to go?

Novel Dataset Preparation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent