Coder Social home page Coder Social logo

unique-image-scan's People

Contributors

danielhoherd avatar mgallizzi avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

unique-image-scan's Issues

Plan features that we want

Stepping back from what the project is at this point in time, let's break down what we want and what we can do.

This ticket describes goals for an MVP, not myriad future integrations.

Goals

These goals are subject to change during discussion.

The primary is to determine unique media (images and movie files) and discover duplicates, and build a list.

Sub-goals:

  • Provide a list of duplicates and include listing of metadata differences.
  • Provide a list of images where only image data differs, metadata is identical (potentially resized versions or different file type such as a DNG export to JPG)

Non-goals

  • Alter any metadata in any source media files
  • Remove or link any source media files on disk

Definitions

Unique Media

"Unique image" is defined as just the image data itself, not metadata. That is, we only care about the raw pixels in the images. If any picture data is different, it's not an identical image.

The following are examples of unique images:

  • Images that have identical DateCreated type fields but have different image data. (EG: an iPhone HDR photo may have this scenario.)
  • Images that have different image data, but have identical ShutterCount. (EG: Images that were exported from DNG to JPG may have identical metadata.)
  • Images that have identical mtime/ctime but different image data.
  • Images that have identical image data but have a different ShutterCount or ImageNumber or equivalent field.

Duplicate Media

Duplicate images may ore may not have different metadata.

The following are examples of duplicate images:

  • Two files differ only because one has only facial recognition boxes added.
  • Two files that differ only by mtime and/or ctime.
  • Two files where one has no EXIF data and the other has a variety of metadata.
  • Two files where one has an audio and video stream, and another has the same audio and video stream but also has a subtitle stream.

Methods for determining duplicate media

  • Read the image data or video stream and hash all or part of that data.
  • Compare EXIF/IPTC/etc. data for fields that should be unique and read-only per file. (EG: GPSPosition may be added, removed or altered during editing, but Model should not change.)

Next steps

  • Rename scan verb to exif-scan or similar #9
  • Add media-scan or similar verb to be used with hashing the image/video stream

Add summary of metadata that was searched

It would be great to see some stats about the metadata that was searched.

For example:

  • Camera models, lens models, shutter speed, f-stop, iso and how many photos had each value
  • Count of photos with no EXIF data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.