Coder Social home page Coder Social logo

pokemon's Introduction

# Pokemon Classification

This is a group project for ECE-GY 6143 Introduction to Machine Learning at NYU Tandon.

Pokemon, a popular TV show, video game, and trading card series, originated from Japan and has been spread all over the world since 1996. A Pokedex, a device that exists in the world of Pokemon, is used to recognize Pokemon by scanning or learning from Pokemon images. The current model we found on Pyimagesearch trains the CNN using Keras and deep learning to build the underneath model of Pokedex. Because of the limited size of its dataset and the single approach it applies, the model accuracy is not ideal. We would like to build a better-performed model on top of the existing one by increasing the size of the dataset and introducing different approaches: pre-trained VGG16 and PCA.

Group 77 members are as below:

  • Qin Hu (N17006855)
  • Bohan Zhang (N13992422)
  • Yichi Zhang (N19888469)
  • Xintong Song (N13489466)

This problem is actually a multi-classification problem. One of the primary limitations of the original project is the small amount of training data. We tested on various images and at times the classifications were incorrect. When this happened, we inspected the input images and network more closely and found that the most dominant color(s) in the image influence the classification dramatically.

Firstly, we decided to build a new and bigger dataset.
Because there are only limited number of pokemon pictures on flicker, we use [Google Image](https://www.pyimagesearch.com/2017/12/04/how-to-create-a-deep-learning-dataset-using-google-images/) to build our dataset.

>Search for a certain pokemon.

>Download urls of all images on the current webpage through javascript console.

>Run dataset_factory/dataset_factory.py to download all images by urls.

Our new dataset contains 6000+ images of 12 pokemons.
Because there are big amounts of pictures, we used 4 Tesla P100 16GB GPUs to train our networks in this project.
Then, we tested it on original CNN network with train : test = 75% : 25%.

After 600 epochs, the train_acc reached 0.96 but the val_acc (test accuracy) still below 0.8 and it is really bad for image classification.
Then, we tried 2 different methods to deal with this multi-classification problem.

Method 1 Description:

VGG16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The model achieves 0.92 top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. Thus, it is appropriate to deal with our problem.


We loaded the pre-trained parameters from VGG16 and applied its layers with 4 extra layers includes 1 flatten layer, 2 fully connected layers and 1 dropout layer to our dataset.

Method 1 Results:
VGG16 significantly improve the test accuracy to 0.96.

Method 2 Description:

Using PCA to reduce the dimensionality (each image is a 67500-dimension vector) of the dataset and achieve high accuracy at the mean time.

1st Approach: Using GridCV to do the optimum parameter search.

Due to low efficiency of GridSearch, even the Nvidia Tesla P100 GPU can run hours for a full size dataset (6000,150,150,3) opreation. Therefore, we load only 4 of 12 classes,200 pictures per class as dataset for this particular method. Each picture is shrinked to 50x50 by pixel and transferred to grayscale. However, these data are carefully picked so that there are less irrelevant pictures in these dataset. There are 2500 features in tatol (800,50,50)->(800,2500).

Parameters are choosed by observation. At first, guess the range of parameters (npc, c, gamma). Fit data using GridSearchCV, then we can find the converge trendency by observing the color map. It allows us to make a more "educated" guess. Fit data using the new range of parameters. Repeat this process until global maximum are showing in the map.

Results:

    Limitation:
  • Need to sacrifice the quality of the picture to make the compensation for the low performance of the algorithm.
  • Less data to train could lead to overfitting.
  • Using PCA can lose some spatial information which is important for classification, so the classification accuracy decreases.
  • Not as good as CNN for this multiclass classfication problem.

2nd Approach:

- Create two folders (train and test) and store all the pokemon images of the selected nine kinds (Arcanine, Bulbasaur, Charizard, Eevee, Lucario, Mew, Pikachu, Squirtle, and Umbereon) into separated folders
- Use ImageDataGenerator to transform image data into data point matrices and combine train and test for scaling. At this point, the entire mini batch has 1000 images, each of which has a dimension of 67,500 (150 * 150 * 3)
- Use StandardScaler() to rescale the data X and fit PCA to find the minimum number of PCs that make PoV greater or equal to 90%
- Create an array of number of PCs for test (2 to the minimum number we just found) and an array of Gamma.
- In the for loops of PC for test and Gamma for test, fit PCA on training data and SVD on transformed training data in each iteration and find the parameters which make the best accuracy

Limitation: large dimension of almost 70,000 features but only 1,000 data points. PCA works as “feature selection” that gets rid of noises or correlations inside an image before applying any classifier. It does not work well in this case because some weird images (i.e.: pokemon on a T-shirt) are hard to detect.

Results:

Conclusion:

The results show that CNN is the most effective method for this problem. With the implementation of VGG16 and data generator, we improve the accuracy from 0.8 to 0.96, which is very good. The PCA methods are not working as good as the CNN. It's probably mainly because PCA can lose some spatial information which is important for classification. And for efficiency wise, CNN is much faster than PCA when using the same Nvidia P100 GPU.

pokemon's People

Contributors

xsong93 avatar alanwakeddd avatar erinqhu avatar nyuzyc avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.