Coder Social home page Coder Social logo

ahmdtaha / finegrainedvisualrecognition Goto Github PK

View Code? Open in Web Editor NEW
22.0 3.0 7.0 1.46 MB

Fine grained visual recognition tensorflow baseline on CUB, Stanford Cars, Dogs, Aircrafts, and Flower102.

License: Apache License 2.0

Python 100.00%
tensorflow fgvr classification cub200 stanford dogs cars aircrafts birds python3

finegrainedvisualrecognition's Introduction

FGVR Tensorflow Baseline

This repo contains training code for FGVR classification using TensorFlow. This implementation achieves comparable state-of-the-art results.

Checkout the Wiki for more technical discussion.

FVGR is a classification task where intra category visual differences are small and can be overwhelmed by factors such as pose, viewpoint, or location of the object in the image. For instance, the following image shows a California gull (left) and a Ringed-beak gull (Right). The beak pattern difference is the key for a correct classification. Such a difference is tiny when compared to the intra-category variations like pose and illumination. FVGR dataset typically involve animal species, models of cars or aircrafts. The following table list six well-known FGVR datasets.

Dataset Num Classes Avg samples Per Class Train Size Val Size Test Size
Flowers-102 102 10 1020 1020 6149
CUB-200-2011 200 29.97 5994 N/A 5794
Stanford Cars 196 41.55 8144 N/A 8041
NABirds 550 43.5 23929 N/A 24633
Aircrafts 100 33.34 3334 3333 3333
Stanford Dogs 120 100 12000 N/A 8580

Requirements

  • Python 3+ [Tested on 3.4.7]
  • Tensorflow 1+ [Tested on 1.8]

Datasets

I prepare my datasets in an un-conviention way. dataset_sample folder provides an example for the cars dataset. Instead of caffe style, listing files and labels in txt file, I use csv file. Reading dataset content in Excel is more appealing, to me, than txt file. To use caffe txt dataset style, make sure to modify CarsTupleLoader and BaseTupleLoader. This should be trivial since these classes return a list of filenames and labels

Preliminary Results

Augmentation using random cropping and horizontal flipping on. No color distortion, vertical flipping or any complex augmentation is employed. The results are preliminary because I didn't wait for till max_iters. Other datasets results and other models like resnet50 will be added later.

Dataset DenseNet161 ResNet50 V2
Flowers-102 93.39 85.59
CUB-200-2011 82.2 69.43
Stanford Cars 91.13 86.84
NABirds 78.80 65.06
Aircrafts 88.65 83.49
Stanford Dogs 81.60 70.36

Running

base_config contains all the parameters needed to train the model. The main function in fgvr_train.py shows how to set these parameters.

Run python fgvr_train.py

Credits:

The following deserve credit for the tips and help provided to finish this code and achieve the reported results

TODO LIST

  • [Done] Write a Wiki about Accumulated Gradient in Tensorflow
  • [Done] Write a Wiki about Batch normalization and how to train and evaluate concurrently.
  • [Done] Report results of these fgvr datasets
  • [Done] Add ResNet implementation
  • [Done] Add other dataset loaders
  • Add separate evaluation.py to eval a trained model

Contributor list

  1. Ahmed Taha

Both tips to improve the code and pull requests to contribute are very welcomed

finegrainedvisualrecognition's People

Contributors

ahmdtaha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.