![](https://camo.githubusercontent.com/e76b6b9b5e1055551687d39eb71cf5f70effbcd4096b84e071f6c18e2e3d3c20/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f313430302f302a4f675f5741776d78434571412d65686e2e706e67)
State = [
isDirectionRight, isDirectionLeft, isDirectionDown, isDirectionUp,
isAppleRight, isAppleLeft, isAppleDown, isAppleUp,
isCollidingRight, isCollidingLeft, isCollidingDown, isCollidingUp
]
Action in [0, 1, 2, 3]
dY, dX = [[0, 1], [0, -1], [1, 0], [-1, 0]][Action] <--> dY, dX = [Right, Left, Down, Up][Action]
// Exploration
if random() < epsilon
choose random action
// Exploitation
else
choose action that maximizes Q[current_state][action]
- The more the AI trains, the more rewards we get (see console.log)
- The Q table and training infos are persisted in local storage
![](https://camo.githubusercontent.com/e47ef2f70dbc4919cb5be326837512aa614ba7ac0bcff66e3de3feb8cf775ec7/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f3732302f312a754c74634e42496d44456f31716355614f4b356e69772e77656270)
![](https://camo.githubusercontent.com/d6b3d3222a40978fb3758f53401953456d5b71129068a3eb4a883087e4668199/68747470733a2f2f6f70656e64696c61622e6769746875622e696f2f44492d656e67696e652f5f696d616765732f44514e2e706e67)
![](./dfs.png)