Comments (3)
I've added a temporary fix for this, which essentially catches when this happens, and restarts training from the previous state, keeping all model history and whatnot.
Need a proper fix for this in sapai/sapai-gym.
from super-ml-pets.
As I assumed all errors were coming from sapai-gym, I added a fix to catch all errors happening there:
andreped/sapai-gym@7443f36
However, to my surprise, when running a regular training (now without the try/except loop in the main training script train_agent.py
, I got an error from within sb3. This is more challenging to solve. Not really sure what is causing it. See error prompt below after about 250k steps:
Traceback (most recent call last):
File ".\main.py", line 28, in <module>
train_with_masks(ret)
File "C:\Users\andrp\workspace\super-ml-pets\src\train_agent.py", line 60, in train_with_masks
model.learn(total_timesteps=ret.nb_steps, callback=checkpoint_callback)
File "C:\Users\andrp\workspace\super-ml-pets\venv38\lib\site-packages\sb3_contrib\ppo_mask\ppo_mask.py", line 579, in learn
self.train()
File "C:\Users\andrp\workspace\super-ml-pets\venv38\lib\site-packages\sb3_contrib\ppo_mask\ppo_mask.py", line 439, in train
values, log_prob, entropy = self.policy.evaluate_actions(
File "C:\Users\andrp\workspace\super-ml-pets\venv38\lib\site-packages\sb3_contrib\common\maskable\policies.py", line 280, in evaluate_actions
distribution.apply_masking(action_masks)
File "C:\Users\andrp\workspace\super-ml-pets\venv38\lib\site-packages\sb3_contrib\common\maskable\distributions.py", line 152, in apply_masking
self.distribution.apply_masking(masks)
File "C:\Users\andrp\workspace\super-ml-pets\venv38\lib\site-packages\sb3_contrib\common\maskable\distributions.py", line 62, in apply_masking
super().__init__(logits=logits)
File "C:\Users\andrp\workspace\super-ml-pets\venv38\lib\site-packages\torch\distributions\categorical.py", line 64, in __init__
super(Categorical, self).__init__(batch_shape, validate_args=validate_args)
File "C:\Users\andrp\workspace\super-ml-pets\venv38\lib\site-packages\torch\distributions\distribution.py", line 55, in __init__
raise ValueError(
ValueError: Expected parameter probs (Tensor of shape (64, 213)) of distribution MaskableCategorical(probs: torch.Size([64, 213]), logits: torch.Size([64, 213])) to satisfy the constraint Simplex(), but found invalid values:
tensor([[4.9590e-11, 2.1976e-10, 6.1887e-01, ..., 3.3524e-13, 4.5890e-12,
5.3164e-14],
[1.4266e-06, 8.7648e-10, 1.3233e-06, ..., 1.5695e-07, 2.9451e-08,
1.5212e-07],
[2.2623e-06, 2.3994e-09, 5.3787e-07, ..., 3.9735e-08, 2.8777e-09,
2.6170e-08],
...,
[1.6828e-12, 4.9032e-04, 9.5983e-13, ..., 1.7402e-13, 1.9223e-13,
5.6725e-14],
[4.7819e-10, 7.7589e-03, 7.8509e-18, ..., 6.4911e-11, 8.8994e-12,
8.3013e-11],
[3.6789e-08, 1.2760e-07, 4.7924e-16, ..., 8.6682e-09, 8.6489e-10,
3.7913e-08]], grad_fn=<SoftmaxBackward0>)
from super-ml-pets.
Random Exception seem to happen after training thousands of steps:
Exception: get_idx < pet-hedgehog 10-1 status-honey-bee 2-1 > not found
What is causing this?
from super-ml-pets.
Related Issues (20)
- Lag in the actual game cause the CV system to malfunction HOT 5
- Maybe not an issue but
- Swap problem HOT 7
- Question HOT 1
- Another training method
- help HOT 3
- Pip install fails in GitHub codespaces HOT 6
- Chrome extension to deploy AI in web app? HOT 1
- Make tailored figure for README HOT 1
- Add unit tests for deployment HOT 4
- ERROR: Failed building wheel for gym: wheel.vendored.packaging._tokenizer.ParserSyntaxError: Expected end or semicolon (after version specifier) HOT 24
- CIs are extremely slow for Python 3.11 HOT 1
- Unhashable type numpy error HOT 10
- Reduce project size HOT 3
- macOS deployment requires administrator rights HOT 5
- SAP's UI and animals have updated - machine vision system needs updating HOT 7
- New UI event causes bot the crash HOT 3
- Add unit test for history plotter HOT 1
- Deployment pet upgrade event broken?
- Improve error message when bot fails to find pets HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from super-ml-pets.