skyzyx / make-meme-text-searchable Goto Github PK

Read the text of memes, then inject that text into the image as searchable metadata.

Makefile 37.67% Go 62.33%

aws exif exif-data exif-metadata images machine-learning machinelearning meme memes photos rekognition rekognition-text

make-meme-text-searchable's Introduction

Make Meme Text Searchable

I have an extensive set of memes I've been collecting since the early days of Flickr. #icanhascheeseburger

It's a pain in the ass to not be able to search my memes (now stored in Apple’s Photos.app) to find what I'm looking for when I need it.

This project uses Amazon Rekognition to read the text from the images, then go-exif to write the text into the image metadata as a caption/description. Photo apps and services should be able to parse and index this data, making your images (memes, really) searchable by the text that's in the image.

General Program Flow

Read the binary image data into memory.
Rekognition only supports PNG and JPEG formats, so…
1. If the image is already a PNG or JPEG, skip to the next step.
2. If the image is a GIF, WEBP, or HEIF format, convert the in-memory representation of the image to JPEG format.
Submit the (PNG or JPEG) bytes of the image to Rekognition.
Get back the results. Merge, deduplicate, and munge the resulting text matches, whatever they are. Words don't always come back in the right order, so think of them less as a sentence and more of a collection of keywords.

Wait a minute, I had something for this.

…might become…

a for had i minute something this wait
Using the EXIF library, write these words into the file (the one we read into memory), into the ImageDescription EXIF field of the metadata.
Optionally, we can:
1. Overwrite the original file with an updated description (destructive).
2. Read the file from one location, and write an updated copy to a new location (non-destructive).
  1. This new location can even be a different format, such as JPEG or PNG. Whatever Go's standard library supports.

Why Go?

Go (aka, "Golang") compiles down to a static binary, and is stupidly fast. It can also compile to WebAssembly, which means it can run in Node.js or web browsers. It also has the fastest boot time on AWS Lambda (more-or-less tied for first place with Rust), so creating a POST endpoint should be easy as well.

Someday, I want to learn how to develop mobile apps so that I can solve this user problem. Compiled Go code can be called from Android and iOS.

Progress

Library

Importable with go get.
Handles image bytestreams and decoding.
Handles converting the image to JPEG before passing to Rekognition.
Sends data to Rekognition.
Parses the results from Rekognition into words.
Converts the image bytestream to JPEG before sending to Rekognition.
Preserves any existing EXIF data.
Writes the words into the EXIF data.

CLI Tool

Usage

Library

Incomplete example. Error handling removed for brevity.

import "github.com/skyzyx/make-meme-text-searchable/meme"

func main() {
    // Open the file as am io.Reader.
    r, _ := os.Open("./images/paris-airport.heic")

    // Read the io.Reader, decode the image, then re-encode the image data as JPEG format.
    buf, _, _, _ := meme.ReadImage(r, meme.DefaultJPEGQuality)

    // Pass the image data to AWS Rekognition.
    results, _ := meme.DetectText(ctx, &awsConfig, buf)

    // Sanitize, de-dupe, remove punctuation, and sort the resulting words.
    words := meme.GetSanitizedText(results)

    // Write the string of words back to the image into the EXIF ImageDescription field.
    _ := meme.WriteImageDescription(r, words)
}

CLI

(Brainstorming) Something like…

meme-text [--report=TEXT|JSON] [--out=FILE] [--outdir=DIR] [--outformat=GIF|HEIC|JPG|PNG|WEBP] [--quiet] [--verbose] [--force] INPUT...

INPUT is one or more files, directories of files, or globs of files. Supports: GIF, HEIC, JPEG, PNG, WEBP. Also works with STDIN.
--report will write data to STDOUT in the specified format.
--quiet will silence all output.
--verbose maybe be specified up to 3 times, with increasing levels of verbosity. The default value is equivalent to WARNING. -v, -vv, and -vvv are equivalent to INFO, DEBUG, and TRACE (respectively).
--force disables any interactive prompts.

Web UI

This should be relatively simple to write as long as people can drag-and-drop/upload their images into the webpage, then provide an email address to send the results to (asychronously). A simple desktop app is also possible — maybe with Electron, Wails, or Tauri?

Things to read

Things I need to read and understand. Apparently writing EXIF data can be non-trivial.

Recommend Projects

skyzyx / make-meme-text-searchable Goto Github PK

make-meme-text-searchable's Introduction

Make Meme Text Searchable

General Program Flow

Why Go?

Progress

Library

CLI Tool

Usage

Library

CLI

Web UI

Things to read

make-meme-text-searchable's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent