Coder Social home page Coder Social logo

smolstr's Introduction

smolstr - 5 bit encoding

smolstr is a concept i had once about space saving by storing text in 5 bits instead of the normal 8 bits which unicode is encoded in.
this is nothing more than a test, feel free to use it anywhere, its open source and should be pretty straight forward

from my own testing this saves on average a ratio of 3/2 in direct size comparison

if you want to test this encoder, i've left a tool in the releases tab which anyone who has touched an command prompt should be able to figure out, also to some extent the user could just drag and drop txt files ontop of exe

file format

The file format is built with three blocks

"THE MAGIC NUMBER" -> [4 bytes, always "smol" in unicode]

"ALPHABET" -> [32 bytes, encoded as raw unicode]

"DATA" -> [? bytes, always padded to u128 (16 bytes, unsigned 128 integer)]

Reading the file format is pretty simple, one could skip first 36 bytes, but its recommended to read what alphabet the file is using otherwise it will text will be jibberish
this will require some bitshifting, so atleast basic understanding on how bitshifting works.

writing

everytime the current u128 runs out of bits (every 25 characters, 5 * 16 = 125), the current u128 is pushed to the output file and another u128 starts, rinse and repeat until there is no data left and the last u128 just gets pushed, so the last u128 always gets padded.

reading

reading is the same as writing but in reverse, so instead of writing you read u128, go through every 5 bits until there are no bits to go through then drop current u128 and read the next u128. when the end of the file is reached you are done!

what is the 1 and 2 in standard alphabet?

  • pushing a 1 will toggle "number mode" all character read during this time will get the index in the alphabet to result in a number a = 1, b = 2, etc... example: "1aabb1" will result the number "1122"
  • pushing a 2 will either make the following letter uppercase or if the following character is a space it will put a newline character (\n)*

*this might be changed in the future

rules

  • charset can be changed BUT charset must be EXACTLY 32 characters. special characters (space, 1, 2) CANNOT be changed
    they should stay the same cause they have different meaning), default is " abcdefghijklmnopqrstuvwxyz.!?12"

smolstr's People

Contributors

reezey avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.