tjwei / 2048-nn Goto Github PK

This project forked from ovolve/2048-ai

A Deep Learning AI for 2048 (2048:94.15%, 4096:78.48%, 8192: 34.5% 16384: 0.177%)

License: MIT License

HTML 2.78% JavaScript 18.02% CSS 19.46% Python 3.67% Jupyter Notebook 56.07%

2048-nn's Introduction

A 2048 Deep Learning AI (that does not suck)

I didn't think neural network is particularly suitable for solving 2048 puzzle, but I implement one anyway. Turns out the result is not too bad, at least much better than I expected.

See it in action http://tjwei.github.io/2048-NN/ and in Video https://www.youtube.com/watch?v=oRC2W38lxIE

Nueral Netowrk AI for the game 2048.

It is a fork of http://ov3y.github.io/2048-AI/ and replace the AI part by a neural network.

The neural network is pretrained using Theano and lasgane by observing simulated games and supervised by an advance AI https://github.com/nneonneo/2048-ai

The trained neural network can achieve 2048+ in >94% of the games. It reaches 4096 in >78% of the games and reaches 8192 in >34% of the games There is a small chance(0.1%~0.2%) the AI can reach a max tile of 16384.

It may sometimes ended with an embarrassingly low score.

The following is the result of 100K games played by the neural network

max tile	% of games	accumulated %	reversed accumulated %
16384	0.177%	0.177%	100.000%
8192	34.500%	34.677%	99.823%
4096	43.819%	78.496%	65.323%
2048	15.656%	94.152%	21.504%
1024	4.340%	98.492%	5.848%
512	1.144%	99.636%	1.508%
256	0.261%	99.897%	0.364%
128	0.064%	99.961%	0.103%
64	0.026%	99.987%	0.039%
32	0.007%	99.994%	0.013%
16	0.005%	99.999%	0.006%
8	0.001%	100.000%	0.001%

or in graph

The average score is 85351.8 and the average length of a game is 3733.6 steps.

The following graph shows how many games left after certain steps in the 100K simulations:

The graph indicates that the AI performs relatively weak in early stage of the game.

Without the help of human made features and heuristics like what is used in https://github.com/nneonneo/2048-ai , the network can still reaches at least 75% success rate for the same network architecture.

A much smaller model trained without using any human made features and heuristics reaches 47% of success rate, can be found at http://github.com/tjwei/rl/

The animationDelay is set to 10. You can make it run faster or slower with a different delay time.

A reinforcement learning algorithm using convolutional neural network is also provides.

To run https://github.com/tjwei/2048-NN/blob/master/my2048-rl-theano-n-tuple-Copy7.ipynb you need selenium firefox lasagne and cudnn

It is likely to reach 2048 tiles in less than 100 games of learning (sometimes less than 50 games).

Modified the learning rate of adam to 0.0001, it should reach 2048 in more than 60% of games.

In my experiment, after 15K training games, it achieves an average 1600+ steps and 30000+ score.

it can consistently reach 2048 in 75%+ of games, with a peak success rate 85%+.

Following plots are the average steps and average score during the training(for every 100 training games).

This beats "An early attempt at applying deep reinforcement learning to the Game 2048" by Gui et al, which reachs 2048 in 7% of games.

But still not as good as n-tuple approches (Szubert et al., Wu, I-C et al.), which can reach 95%+ successiful rate.

This network seems to work better with larger N_FILTERS and smaller learning rate, but have not yet run the experiment.

2048-nn's People

Contributors

Stargazers

Watchers

Forkers

roth1002 xhygh tryanaditya kc17 beebrain alexts1993 timamirrockdude dimart itsdin yuanyu90221 skyeking pieterdonkers archit094 jerrychiao pagefau1t bluejay123 dbredep yaduvendra cdpap mkempers lenerdz pwangdu reiisky namanchaw1 micronoz scepter914 alexanderko yekyu94 snuffle-px lkhcnn longlay317 wolferl42195 yurijmikhalevich vishaalvenkatesh kujimamiharu compsciftw bp404 nkumarcc vatsal-jha256

2048-nn's Issues

no module name c2048 , how do i install c2048 in windows version of anaconda?

how do i use this?

Ive downloaded everything but I dont know how to run it, sorry if its a stupid question.

The probabilities are off

Hey @tjwei ! i think you may may be surprised at my score...

...isn't the max score 16384?

NN AI wont download model

Ive downloaded the zip file, unpacked it and opened index.html with firefox.

It wont download. It only shows:

Downloading model......(about 25M).
NN AI won't work until the model is loaded

all the time

Convert to WASM!

uhh lets make it uh... faster !! :)))

Can you make this code for windows ?

AI does not work on safari

AI does not seem to work on Safari on a mac

Getting this error

ImportError Traceback (most recent call last)
in ()
----> 1 from lasagne.layers.dnn import Conv2DLayer
2 from lasagne.regularization import regularize_network_params, l1, l2, regularize_layer_params_weighted

~\Anaconda3\lib\site-packages\lasagne\layers\dnn.py in ()
40 else:
41 raise ImportError(
---> 42 "requires GPU support -- see http://lasagne.readthedocs.org/en/"
43 "latest/user/installation.html#gpu-support") # pragma: no cover
44

ImportError: requires GPU support -- see http://lasagne.readthedocs.org/en/latest/user/installation.html#gpu-support

For training, what did you use for determining reward?

I'm working on a similar project in my free time, and am curious on much info should go into the reward function, how heavy to weight certain actions or failures, etc

Suggestion: Training outputs

I'm assuming that when training the AI, it had only 4 outputs (north,south,east,west). Apologies if that's a faulty assumption.

For training the network, you might consider training it for a few additional outputs - not because you need them to play the game, but because by needing to provide them, the network will need an additional awareness of the game mechanics.
4 outputs (N/S/E/W) for when a move in that direction is possible, 0 if not possible.
4 outputs (N/S/E/W) for when a move in that direction will merge tiles.
1 output for a complexity score of the remaining tiles, after whatever mergers happen for the requested move.

The 'human' analog of training for these outputs would be 'learning the rules of the game' - It'll know when tiles merge and it'll know when moves are invalid, and it may internalize some of the logic for that in its decision process for moves.