Coder Social home page Coder Social logo

steventrouble / efficientzero Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yewr/efficientzero

7.0 7.0 2.0 2.22 MB

Fork of EfficientZero to use newer libraries and to fix a few runtime bugs. Also includes pretrained models!

License: GNU General Public License v3.0

Shell 0.55% C++ 9.74% Python 86.62% Cython 3.09%

efficientzero's People

Contributors

jl1990 avatar steventrouble avatar yewr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

efficientzero's Issues

FR: Work better on differently sized GPUs

Every GPU has different memory capabilities, but the current implementation doesn't account for this. We should compute (or take in as a flag) the amount of GPU memory per GPU, and automatically spin up the correct number of workers for the amount of GPU.

DataWorker freezes for unknown reason

When running, the DataWorker will occasionally freeze without any error. Looking at the stack from ray stack, it doesn't look like it's freezing anywhere in the code, so I'm not sure why or how it's freezing.

GPU Worker crashes with no error or stack dump

Whenever the GPU worker starts training, it immediately crashes with no error. Instead, all I get is

2022-07-07 19:05:41,609 WARNING worker.py:1404 -- A worker died or was killed
while executing a task by an unexpected system error. To troubleshoot the
problem, check the logs for the dead worker. 
RayTask ID: ffffffffffffffff2aeefb9774b8f9463ffdfd8101000000
Worker ID: 21602417afb8b58af6db10cb511242afac87db0eb5b09f5606320616 
Node ID: 51e55bca411e9e811bf5c67089cbf9867f5f8374c2fce4a8370c987c 
Worker IP address: *
Worker port: *
Worker PID: 6218

I grepped through all the logs and stdout, but can't find any information about what the error was, or where it occurred.

Too many values to unpack (expected 4) on LambdaLabs

Running on a new LambdaLabs instance (A100 x1, 40GB) returns an error in selfplay_worker.py:

ValueError: too many values to unpack (expected 4)

The issue seems likely to be a version mismatch between gym and some other library we didn't specify the version of in the requirements.txt. Long term, we'll want to update everything to use gymnasium and the latest versions of ALE, but for now we need to figure out which package is causing this conflict and freeze it in the requirements file.

Full error below.

Traceback (most recent call last):
File "/home/ubuntu/EfficientZero/core/selfplay_worker.py", line 107, in run
  self._run()
File "/home/ubuntu/.local/share/virtualenvs/EfficientZero-KqajkB4Z/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 460, in _resume_span
  return method(self, *_args, **_kwargs)
File "/home/ubuntu/EfficientZero/core/selfplay_worker.py", line 139, in _run
  init_obses = [env.reset() for env in envs]
File "/home/ubuntu/EfficientZero/core/selfplay_worker.py", line 139, in <listcomp>
  init_obses = [env.reset() for env in envs]
File "/home/ubuntu/EfficientZero/config/atari/env_wrapper.py", line 37, in reset
  observation = self.env.reset(**kwargs)
File "/home/ubuntu/.local/share/virtualenvs/EfficientZero-KqajkB4Z/lib/python3.8/site-packages/gym/core.py", line 311, in reset
  return self.observation(self.env.reset(**kwargs))
File "/home/ubuntu/EfficientZero/core/utils.py", line 120, in reset
  obs = self.env.reset(**kwargs)
File "/home/ubuntu/EfficientZero/core/utils.py", line 58, in reset
  return self.env.reset(**kwargs)
File "/home/ubuntu/EfficientZero/core/utils.py", line 155, in reset
  return self.env.reset(**kwargs)
File "/home/ubuntu/EfficientZero/core/utils.py", line 82, in reset
  obs, _, done, _ = self.env.step(self.noop_action)
File "/home/ubuntu/.local/share/virtualenvs/EfficientZero-KqajkB4Z/lib/python3.8/site-packages/gym/wrappers/order_enforcing.py", line 13, in step
  observation, reward, done, info = self.env.step(action)
ValueError: too many values to unpack (expected 4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.