Coder Social home page Coder Social logo

imagecaptiongeneration's Introduction

Screenshots

Image Captioning Flask Web App

This repository contains a Flask web application that utilizes an image captioning model based on the encoder-decoder architecture. The model generates captions for images provided by users through the web interface. The model uses a pre-trained VGG model to extract image features and an LSTM-based decoder to generate captions for the given images.

Usage

  1. Clone this repository to your local machine:
git clone https://github.com/K-1303/ImageCaptionGeneration
cd ImageCaptionGeneration
  1. Install required packages:
pip install requirements.txt
  1. Run the Flask web application:
python app.py
  1. Open your web browser and go to http://localhost:5000. The web application should now be running.

  2. Upload an image using the provided form. Click on the "Generate Caption" button, and the model will process the image and display the generated caption below the uploaded image.

Image Captioning Model

The image captioning model used has the following architecture:

Encoder (Image Feature Extraction):
    Input Layer: A 4096-dimensional vector representing the image features.
    Dropout Layer: To prevent overfitting, with a dropout rate of 0.4.
    Dense Layer: Reduces the dimensionality of the image features to 256 units, using the ReLU activation function.

Encoder (Sequence Feature Extraction):
    Input Layer: Takes in the tokenized captions as input, with a shape of (max_length,).
    Embedding Layer: Converts integer-encoded tokens into dense vectors of 256 dimensions. The vocab_size parameter indicates the number of unique words in the vocabulary, and mask_zero=True is used to mask zero-padded tokens during training.
    Dropout Layer: Applied to the embedded sequences with a dropout rate of 0.4.
    LSTM Layer: A Long Short-Term Memory (LSTM) layer with 256 units to process the sequence of embedded tokens.

Decoder Model:
    Combines the image features and sequence features using an element-wise addition (add operation).
    Dense Layer: Processes the combined features with 256 units and ReLU activation.
    Output Layer: A Dense layer with vocab_size units and a softmax activation function to generate the probability distribution over the vocabulary.

imagecaptiongeneration's People

Contributors

k-1303 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.