mkduer / reinforcement-learning-snake-game Goto Github PK
View Code? Open in Web Editor NEW(2018) Project to create a snake game, train it with reinforcement learning, and see if it learns to play
License: Apache License 2.0
(2018) Project to create a snake game, train it with reinforcement learning, and see if it learns to play
License: Apache License 2.0
low priority
up and down controls for human are reversed. Controls are also sluggish.
Relevant to #36
These can be added to the paper per more work with extra time:
Discarded:
Epsilon
0.1 seems to be the best Epsilon value
===
Discarded
Learning Rate 0.005
Discount Factor 0.9
Successful Learning
Using a quick solution to reset the game until #8 is completed in order to train the AI and test Q Learning algorithm's state updates.
given extra time run 1,000,000+ episodes to see just how well the snake can learn (e.g. in terms of peak score, in terms of game converging)
adjusting board size (larger, smaller, regular)
reward tests #73
test vs training: run test on constant hyperparameters multiple times to check for the consistency and overfitting (note append to file for this), possibly add with a changed board size
Discarded Tests:
Save final training data and run test
Single Episode
timing (start, finish, total length)
steps/frames
total successful eats aka score
body collisions
wall collisions
hyperparameters
for producing proof of concept (in other words, keeping things simple to start)
Simple q-learning algorithm (with η = 1):
Q(s, a) = (Q(s, a)) + (Q(s, a) + γ [maxa'(Q(s', a')))] - (Q(s, a)))
Epsilon-Greedy q-learning algorithm:
Q(s, a) = (1 - ε)(Q(s, a)) + η (Q(s, a) + γ [maxa'(Q(s', a')))] - (Q(s, a)))
DONE
We need to come up with a way to have the game not completely close after each episode. my idea is would be to have the game boot with a prompt (press start) and then after each collision go back to the prompt while feeding the data to the q-table class.
Testing closer to 0.6 and 0.75
The optimal value seems to be 0.85-0.95
Fixes needed (related to #29):
This would likely have a broader spread and look more interesting visually. Depending on how the other graphs look, this may go from being an extra time item to an implement item.
DEFAULT VALUES for all tests (excepting the one being tested):
Learning Rate: 0.005
Epsilon: 0.1
Discount Factor: 0.9
Rewards: Wall -100, Body -100, Empty -10, Mouse 100
Generate lines for both steps and scores.
As learning rate had a noticeable impact on the snake, we are increasing the number of rates that are tested.
For 5000 20,000 total episodes, prior to setting epsilon. Changed to 20,000 because of interesting results from 0.01 learning rate around/post 10,000 runs where the learning actually seems to devolve a bit.
There modifications need to be made with the snake program to store the output of the Q-table as a digit to be interpreted as a movement in the game.I can do this over the break in order to start with a basic proof of concept. it wouldn't have any learning but it could be set up to make a series of random movements
Results
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.