Coder Social home page Coder Social logo

powerai-image-memorability's Introduction

Predicting Image Memorability with MemNet in Keras on PowerAI

This code pattern will enable you to build an application that predicts how "unique" or "memorable" images are. You'll do this through the Keras deep learning library, using the MemNet architecture. The dataset this neural network will be trained on is called "LaMem" (Large-scale Image Memorability), by MIT. In order to process the 45,000 training images and 10,000 testing images (227x227 RGB) efficiently, we'll be training the neural network on a PowerAI machine on NIMBIX, enabling us to benefit from NVLink (direct CPU-GPU memory interconnect) without needing any extra code. Once the model has been trained on PowerAI, we'll convert it to a CoreML model and expose it via a web application written in Swift, running on a Kitura server on macOS.

When the reader has completed this pattern, they'll understand how to:

  • Train a Keras model on PowerAI.
  • Use a custom loss function with a Keras model.
  • Convert tf.keras models that deal with images to CoreML models.
  • Use the Apple Vision framework with a CoreML model in Swift to get VNCoreMLFeatureValueObservations.
  • Host a Web Server with Kitura
  • Expose a Mustache HTTP template through Kitura

Flow

TODO: add flow diagram

  1. A Keras model is trained with the LaMem dataset.
  2. The Keras model is converted to a CoreML model.
  3. The user uploads their image to the kitura web app.
  4. The Kitura web app uses the CoreML model for predictions.
  5. The user recieves the neural network's prediction.

Included Components

  • IBM Power Systems: A server built with open technologies and designed for mission-critical applications.
  • IBM PowerAI: A software platform that makes deep learning, machine learning, and AI more accessible and better performing.
  • Kitura: Kitura is a free and open-source web framework written in Swift, developed by IBM and licensed under Apache 2.0. It’s an HTTP server and web framework for writing Swift server applications.

Featured Technologies

  • Artificial Intelligence: Artificial intelligence can be applied to disparate solution spaces to deliver disruptive technologies.
  • Swift on the Server: Build powerful, fast and secure server side Swift apps for the Cloud.

Prerequisites

  • If you don't already have a PowerAI server, you can acquire one from Nimbix or from the PowerAI offering on IBM Cloud.
  • macOS 10.13 (High Sierra) or later

Steps

  1. Clone the repo
  2. Download the LaMem data
  3. Train the Keras model
  4. Convert the Keras model to a CoreML model
  5. Run the Kitura web app

1. Clone the repo

Clone the powerai-image-memorability repo onto both your PowerAI server and local macOS machine. In a terminal, run:

git clone https://www.github.com/IBM/powerai-image-memorability

2. Download and extract the LaMem data

To download the LaMem dataset, head over to the powerai_serverside directory, and run the following command:

wget http://memorability.csail.mit.edu/lamem.tar.gz

Once the dataset is done downloading, run the following command to extract that data:

tar -xvf lamem.tar.gz

3. Train the Keras model

To train the Keras model, run the following command inside of the powerai_serverside directory:

python train.py

Once Python script is done running, you'll see a memnet_model.h5 model in the powerai_serverside directory. Copy that over to the webapp directory on the macOS machine that you'd like to run the frontend on.

4. Convert the Keras model to a CoreML model

Inside of the webapp directory on your macOS machine, run the following Python script to convert your Keras model to a CoreML model:

python convert_model.py memnet_model.h5

This may take a few minutes, but when you're done, you should see a lamem.mlmodel file in the webapp directory.

5. Run the Kitura web app

Then, you're ready to roll! Run the following command to build & run your application:

swift build && swift run

Now, you can head over to localhost:3333 in your favourite web browser, upload an image, and calculate its memorability.

TODO: add screenshot

powerai-image-memorability's People

Contributors

dolph avatar stevemart avatar tanmayb123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

dolph alientales

powerai-image-memorability's Issues

Predicting score problem

Hello, I am trying to use your code without kitura and swift just with python, but it doesn't matter what image I use, each of the predicting images has at least ~0.79 memorability score, even completely white image has ~0.83 score. Please help me, here is my code:

...
model = Sequential()
model.add(Conv2D(96, (11, 11), (4, 4), activation="relu", input_shape=(227, 227, 3)))
model.add(MaxPooling2D((3, 3), (2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(256, (5, 5), activation="relu"))
model.add(ZeroPadding2D((2, 2)))
model.add(MaxPooling2D((3, 3), (2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(384, (3, 3), activation="relu"))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(384, (3, 3), activation="relu"))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(256, (3, 3), activation="relu"))
model.add(ZeroPadding2D((1, 1)))
model.add(MaxPooling2D((3, 3), (2, 2)))
model.add(GlobalAveragePooling2D())
model.add(Dense(4096, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(4096, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(1))

train_split = load_split("../lamem/splits/train_1.txt")
test_split = load_split("../lamem/splits/test_1.txt")
batch_size = 64 * 4

train_gen = lamem_generator(train_split, batch_size=batch_size)
test_gen = lamem_generator(test_split, batch_size=batch_size)

model.compile("adam", euclidean_distance_loss)
model.fit(train_gen, steps_per_epoch=int(len(train_split) / batch_size), epochs=5, verbose=1, validation_data=test_gen, 
validation_steps=int(len(test_split) / batch_size))

model.save("memnet_model2.h5")
// I am trying also with separate weights saving.
model.save_weights('memnet_model2_w')

then in other script:

model = tf.keras.models.load_model('memnet_model2.h5', custom_objects={'euclidean_distance_loss': 
euclidean_distance_loss})
// tried also with load_weights, I have hoped this may help(
//model.load_weights('memnet_model2_w')

def load_image(image_file):
return np.array(Image.open(image_file).resize((227, 227)).convert("RGB"), dtype="float32") / 255.

test_img = mp.Pool().map(load_image, ['predict/7.png'])
test_img = np.array(test_img)
print(test_img.shape)
# test_img.reshape(-1, 227, 227, 3)
print(np.array(test_img).shape)

prediction = model.predict(np.array(test_img))
print(prediction)

Last layer activation

Hello,
Adding last layer activation can give us values greater than 1. Shoulnt it be a sigmoid to limit the regression values between 0 and 1?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.