Coder Social home page Coder Social logo

mojibake's Introduction

mojibake

Encode and decode arbitrary bytes as a sequence of emoji optimized to produce the smallest number of graphemes.

Description

This is not a space efficient library.

Generally services(Twitter,Mastodon,etc) will restrict the number of characters you're allowed to submit based on the grapheme count, not the literal character count. Singular emoji graphemes often consist of multi byte sequences that include multiple characters.

Therefore, if you can encode more data in a smaller number of graphemes, you can transmit more information while also having far more bytes than you otherwise would.

There are at least 2048 unique emoji graphemes in the unicode specification. Therefore an emoji is actually just an 11 bit unsigned integer with extra steps.

This library packs bytes bytes into 11 bit unsigned integers, which are then mapped to sequences of unicode characters that display as a single grapheme.

Example

Original Text:
 Value: Shrek 2 was the greatest film ever made!!
 Bytes: 41,
 Characters: 41,
 Graphemes: 41

Mojibake Encoded:
 Value: πŸ‡»πŸ‡³πŸ‘ŒπŸΏπŸͺ€πŸ”ΆπŸ«³πŸΏπŸ§πŸ»πŸ“ΌπŸ•ΊπŸΎπŸ€›πŸ»πŸ¦ΊπŸ€΅πŸ½πŸ‘¦πŸΌπŸ—„οΈπŸ’†πŸΏβš—οΈβ†—οΈ2️⃣πŸ§₯πŸ€΅πŸ»πŸ•€πŸ™†πŸ«šπŸͺ™πŸ˜ŸπŸ‡¦πŸ‡ͺπŸ«³πŸ½πŸ‡ΈπŸ‡²πŸ˜ΉπŸ΄σ §σ ’σ ³σ £σ ΄σ ΏπŸ›ŒπŸ»
 Bytes: 210,
 Characters: 55,
 Graphemes: 30

Decoded Text:
 Value: Shrek 2 was the greatest film ever made!!
 Bytes: 41,
 Characters: 41,
 Graphemes: 41

mojibake's People

Contributors

minisculegirraffe avatar dependabot[bot] avatar bondo avatar

Stargazers

urothis avatar Kostadin Tonchekliev avatar Witaut Bajaryn avatar ccQpein avatar ZJPzjp avatar SΓ©bastien d'Herbais de Thun avatar astrolemonade avatar Farooq Karimi Zadeh avatar  avatar Kyle L. Davis avatar Nourman Hajar avatar Alexis "Horgix" Chotard avatar Hector Peeters avatar Christoph Grabo avatar kleines FilmrΓΆllchen avatar

Forkers

nazimgol

mojibake's Issues

Add a command line client

In addition to the library, I'd also like to add a command line application that can encode and decode values/files.

Add encoding optimizing for Grapheme Clusters instead of Graphemes

While services often use grapheme count for character limits, the better analog for number of visual elements is grapheme clusters.

An encoding that takes advantage of zero-width joiner (ZWG) to encode grapheme clusters made of multiple graphemes (e.g. gender, skin tone modifiers) should improve the visual density of encoded information. As a bonus, this will also increase the diversity of generated emojis.

Reference

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.