Coder Social home page Coder Social logo

skyzyx / make-meme-text-searchable Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 5.01 MB

Read the text of memes, then inject that text into the image as searchable metadata.

Makefile 37.67% Go 62.33%
aws exif exif-data exif-metadata images machine-learning machinelearning meme memes photos rekognition rekognition-text

make-meme-text-searchable's Introduction

Make Meme Text Searchable

I have an extensive set of memes I've been collecting since the early days of Flickr. #icanhascheeseburger

It's a pain in the ass to not be able to search my memes (now stored in Apple’s Photos.app) to find what I'm looking for when I need it.

I had something for this.

This project uses Amazon Rekognition to read the text from the images, then go-exif to write the text into the image metadata as a caption/description. Photo apps and services should be able to parse and index this data, making your images (memes, really) searchable by the text that's in the image.

General Program Flow

  1. Read the binary image data into memory.

  2. Rekognition only supports PNG and JPEG formats, so…

    1. If the image is already a PNG or JPEG, skip to the next step.

    2. If the image is a GIF, WEBP, or HEIF format, convert the in-memory representation of the image to JPEG format.

  3. Submit the (PNG or JPEG) bytes of the image to Rekognition.

  4. Get back the results. Merge, deduplicate, and munge the resulting text matches, whatever they are. Words don't always come back in the right order, so think of them less as a sentence and more of a collection of keywords.

    Wait a minute, I had something for this.

    …might become…

    a for had i minute something this wait

  5. Using the EXIF library, write these words into the file (the one we read into memory), into the ImageDescription EXIF field of the metadata.

  6. Optionally, we can:

    1. Overwrite the original file with an updated description (destructive).

    2. Read the file from one location, and write an updated copy to a new location (non-destructive).

      1. This new location can even be a different format, such as JPEG or PNG. Whatever Go's standard library supports.

Why Go?

Go (aka, "Golang") compiles down to a static binary, and is stupidly fast. It can also compile to WebAssembly, which means it can run in Node.js or web browsers. It also has the fastest boot time on AWS Lambda (more-or-less tied for first place with Rust), so creating a POST endpoint should be easy as well.

Someday, I want to learn how to develop mobile apps so that I can solve this user problem. Compiled Go code can be called from Android and iOS.

Progress

Library

  • Importable with go get.
  • Handles image bytestreams and decoding.
  • Handles converting the image to JPEG before passing to Rekognition.
  • Sends data to Rekognition.
  • Parses the results from Rekognition into words.
  • Converts the image bytestream to JPEG before sending to Rekognition.
  • Preserves any existing EXIF data.
  • Writes the words into the EXIF data.

CLI Tool

  • Supports reading a file.
  • Supports reading a directory.
  • Supports reading a glob.
  • Supports verbose logging.
  • Supports AWS credentials as environment variables.
  • Supports AWS credentials as a profile reference.
  • Supports -v.
  • Supports -vv and -vvv.
  • Supports -q.
  • Supports outputting a copy to a new directory.
  • Supports outputting a copy in a new format.
  • Supports writing the Rekognition results into EXIF data at all.
  • Supports status updates for jobs.
  • Supports an index of already-processed images to facilitate restarting a failed queue.

Usage

Library

Incomplete example. Error handling removed for brevity.

import "github.com/skyzyx/make-meme-text-searchable/meme"

func main() {
    // Open the file as am io.Reader.
    r, _ := os.Open("./images/paris-airport.heic")

    // Read the io.Reader, decode the image, then re-encode the image data as JPEG format.
    buf, _, _, _ := meme.ReadImage(r, meme.DefaultJPEGQuality)

    // Pass the image data to AWS Rekognition.
    results, _ := meme.DetectText(ctx, &awsConfig, buf)

    // Sanitize, de-dupe, remove punctuation, and sort the resulting words.
    words := meme.GetSanitizedText(results)

    // Write the string of words back to the image into the EXIF ImageDescription field.
    _ := meme.WriteImageDescription(r, words)
}

CLI

(Brainstorming) Something like…

meme-text [--report=TEXT|JSON] [--out=FILE] [--outdir=DIR] [--outformat=GIF|HEIC|JPG|PNG|WEBP] [--quiet] [--verbose] [--force] INPUT...
  • INPUT is one or more files, directories of files, or globs of files. Supports: GIF, HEIC, JPEG, PNG, WEBP. Also works with STDIN.
  • --report will write data to STDOUT in the specified format.
  • --quiet will silence all output.
  • --verbose maybe be specified up to 3 times, with increasing levels of verbosity. The default value is equivalent to WARNING. -v, -vv, and -vvv are equivalent to INFO, DEBUG, and TRACE (respectively).
  • --force disables any interactive prompts.

Web UI

This should be relatively simple to write as long as people can drag-and-drop/upload their images into the webpage, then provide an email address to send the results to (asychronously). A simple desktop app is also possible — maybe with Electron, Wails, or Tauri?

Things to read

Things I need to read and understand. Apparently writing EXIF data can be non-trivial.

make-meme-text-searchable's People

Contributors

skyzyx avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.