Coder Social home page Coder Social logo

Metadata Index Structure for CAS

This project organizes and indexes files in a Content Addressable Storage (CAS) system. The metadata is stored in separate index files, ensuring that SHA-256 hashes are always in uppercase and truncated to 16 characters. The metadata includes current and previous locations, resolution, audio codec, video codec, MD5 hash mapping (truncated to 16 characters, uppercase), preferences (ranking from 1-9), source URLs, modification times (mtime), file extensions (validated by MIME type), bitrate, MIME types as indicated by the file command, deleted files, lost files, and tags. Tags are categorized into user tags (utags) derived from directory names and file tags (ftags) derived from filenames.

Directory Structure

Store all metadata index files in a structured directory under /index/sha256.

Directory Structure:

/index
  /sha256
    /vid
      vcodec.txt
      acodec.txt
      vres.txt
    /img
      vres.txt
    /aud
      acodec.txt
      bitrate.txt
    loc_current.txt
    loc_previous.txt
    md5.txt
    pref.txt
    source_url.txt
    mtime.txt
    ext.txt
    mime.txt
    deleted.txt
    lost.txt
    utags.txt
    ftags.txt
  people.txt

Metadata Index Files

  1. /index/sha256/vid/vcodec.txt
ABCDEF1234567890 H.264
1234567890ABCDEF MP4
  1. /index/sha256/vid/acodec.txt
ABCDEF1234567890 AAC
1234567890ABCDEF MP3
  1. /index/sha256/vid/vres.txt
ABCDEF1234567890 1920x1080=2073600
1234567890ABCDEF 1280x720=921600
  1. /index/sha256/img/vres.txt
ABCDEF1234567890 4000x3000=12000000
1234567890ABCDEF 1920x1080=2073600
  1. /index/sha256/aud/acodec.txt
ABCDEF1234567890 AAC
1234567890ABCDEF MP3
  1. /index/sha256/aud/bitrate.txt
ABCDEF1234567890 320kbps
1234567890ABCDEF 128kbps
  1. /index/sha256/loc_current.txt
ABCDEF1234567890 /directory/to/file...ABCDEF1234567890...txt
1234567890ABCDEF /another/path/to/file...1234567890ABCDEF...txt
  1. /index/sha256/loc_previous.txt
ABCDEF1234567890 /old/directory/to/file...ABCDEF1234567890...txt
1234567890ABCDEF /previous/path/to/file...1234567890ABCDEF...txt
ABCDEF1234567890 /another/old/path/to/file...ABCDEF1234567890...txt
  1. /index/sha256/md5.txt
ABCDEF1234567890 ABCDEF1234567890
1234567890ABCDEF 1234567890ABCDEF
  1. /index/sha256/pref.txt
ABCDEF1234567890 9
1234567890ABCDEF 1
  1. /index/sha256/source_url.txt
ABCDEF1234567890 http://example.com/source1
1234567890ABCDEF http://example.com/source2
  1. /index/sha256/mtime.txt
ABCDEF1234567890 20230710123000
ABCDEF1234567890 20230711143000
1234567890ABCDEF 20230710123000
1234567890ABCDEF 20230712123000
  1. /index/sha256/ext.txt
ABCDEF1234567890 mp4
1234567890ABCDEF jpg
  1. /index/sha256/mime.txt
ABCDEF1234567890 video/mp4
1234567890ABCDEF image/jpeg
  1. /index/sha256/deleted.txt
ABCDEF1234567890 20230715123000
1234567890ABCDEF 20230716143000
  1. /index/sha256/lost.txt
ABCDEF1234567890 20230720123000
1234567890ABCDEF 20230721143000
  1. /index/sha256/utags.txt
ABCDEF1234567890 DIRECTORY
1234567890ABCDEF ANOTHER
1234567890ABCDEF DIRECTORY
  1. /index/sha256/ftags.txt
ABCDEF1234567890 this
ABCDEF1234567890 file
1234567890ABCDEF name
1234567890ABCDEF file
  1. /index/people.txt
John Doe, dob=19800101, occupation=actor
Jane Smith, dob=19750101, occupation=actress
Alice Johnson, dob=19900101, occupation=director

Example Workflow

1. Compute SHA-256 and MD5 Hashes

SHA-256 Hash: abcdef1234567890abcdef1234567890abcdef1234567890abcdef1234567890 Truncated SHA-256 Hash: ABCDEF1234567890 MD5 Hash: abcdef1234567890abcdef1234567890 Truncated MD5 Hash: ABCDEF1234567890

2. Normalize Filename

Original: Example Document.txt Normalized: example.document.txt Final Filename: example.document...ABCDEF1234567890...txt

3. Commit File to CAS Datastore

CAS Path: /files/sha256/A/B/C/ABCDEF1234567890.txt

4. Extract Metadata and Update Index Files

Use tools like MediaInfo or ExifTool to extract metadata and manually update index files:

/index/sha256/vid/vcodec.txt:

ABCDEF1234567890 H.264

/index/sha256/vid/acodec.txt:

ABCDEF1234567890 AAC

/index/sha256/vid/vres.txt:

ABCDEF1234567890 1920x1080=2073600

/index/sha256/img/vres.txt:

ABCDEF1234567890 4000x3000=12000000

/index/sha256/aud/acodec.txt:

ABCDEF1234567890 AAC

/index/sha256/aud/bitrate.txt:

ABCDEF1234567890 320kbps

/index/sha256/loc_current.txt:

ABCDEF1234567890 /directory/to/file...ABCDEF1234567890...txt

/index/sha256/loc_previous.txt:

ABCDEF1234567890 /old/directory/to/file...ABCDEF1234567890...txt

/index/sha256/md5.txt:

ABCDEF1234567890 ABCDEF1234567890

/index/sha256/pref.txt:

ABCDEF1234567890 9

/index/sha256/source_url.txt:

ABCDEF1234567890 http://example.com/source1

/index/sha256/mtime.txt:

ABCDEF1234567890 20230710123000
ABCDEF1234567890 20230711143000

/index/sha256/ext.txt:

ABCDEF1234567890 mp4

/index/sha256/mime.txt:

ABCDEF1234567890 video/mp4

/index/sha256/deleted.txt:

ABCDEF1234567890 20230715123000

/index/sha256/lost.txt:

ABCDEF1234567890 20230720123000

/index/sha256/utags.txt:

ABCDEF1234567890 DIRECTORY

/index/sha256/ftags.txt:

ABCDEF1234567890 this
ABCDEF1234567890 file

/index/people.txt:

John Doe, dob=19800101, occupation=actor
Jane Smith, dob=19750101, occupation=actress
Alice Johnson, dob=19900101, occupation=director

grokdatum's Projects

grokdatum doesnโ€™t have any public repositories yet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.