Hi, thanks for such a great job!
You mentioned in your paper that "As a reference, unconstrained Atari agents are usually trained for 50 million steps" and "Each run utilized around 12GB of VRAM and took approximately 2.9 days on a single Nvidia RTX 4090".
I noticed that in your code, the outer loop that calls self.train_agent() is called 1000 times. The inner loop that trains the agent self.train_component(name, steps) is called 5000 times.
This is equivalent to the agent training 1000*5000 steps. The time corresponding to the inner loop is [496/5000 [12:59<1:09:31, 1.08it/s] about 1 hours.
So I guess the total time is about 1000 hours, and this is just for training agent model.
Did I overlook something ? And what should I do to reproduce the training time? @eloialonso@AdamJelley