Coder Social home page Coder Social logo

Comments (1)

Burton2000 avatar Burton2000 commented on August 22, 2024

Hey sorry for the delayed reply.

The RNN/LSTM captioning models are trained on last layer CNN features extracted from the COCO images, these features are the dataset given to us not the raw images.

To test your own images you would have to use the same CNN model they used to extract features from your test image. They use a VGG16 model pretrained on imagenet so it should be doable to extract but will take time to learn how to do if you aren't familiar with this sort of thing.

On top of that they have reduced the dimension further from 4096 to 512 using PCA (check the section under Microsoft COCO in RNN_captioning notebook). Probably best to just retrain using the full 4096 features ( but then you would have to change things in the assignment to match this new input size).

If you can extract the feature vector from your image then checking the caption is the simple bit and the code below shows how to do it, just place it in a cell right at the end of the notebook.

# Load an image
test_im = plt.imread('./kitten.jpg')
plt.imshow(test_im)
plt.show()

# Feature extraction of the image
# TODO

# Im just using a placeholder input to show the code below will work.
test_input = np.ones([1,512])  

# Forward pass of the model.
cap_sample = small_rnn_model.sample(test_input)
cap_sample = decode_captions(cap_sample, data['idx_to_word'])
print(cap_sample)

from cs231n-2017.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.