Coder Social home page Coder Social logo

saxbophone / basest-python Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 0.0 194 KB

Arbitrary base binary-to-text encoder (any base to any base), in Python.

Home Page: https://pypi.org/project/basest/

License: Mozilla Public License 2.0

Python 98.80% Makefile 1.20%
number-base-converter encoder decoder conversion encoding decoding encodings base64 base58 base85

basest-python's Introduction

I've been programming for a little over a decade now, to a professional level for several years.

My professional experience is mostly in web dev, but I also have some Windows desktop application experience.

Projects I am particularly proud of:

github.com/saxbophone/arby arby is a C++ library implementing arbitrary-precision arithmetic, both at runtime and compile-time!
It exposes convenient-to-use class types encapsulating the arithmetic, with operator overloading and standard stream support.
github.com/saxbophone/hexago hexago is a cross-platform screensaver written in C++. It draws pretty shrinking hexagons!
It integrates with the screensaver frameworks of both macOS and Windows (with some Objective-C++ glue code for the former!)
Cross-platform C++20 project template This is a Github project template intended for cross-platform C++20 dev.
It includes an extensive CMake project config with lots of warning options enabled, and Github Actions CI config for unit testing on Linux, macOS and Windows.
I use it for all my stuff and other people seem to find it useful too.
github.com/saxbophone/libsxbp sxbp and its implementation library, libsxbp are a pair of C projects exploring unconventional barcodes and procedural image generation.
They implement a novel barcode of my own design, where binary bits are encoded by guiding the line of a right-angled spiral left or right as prescribed by the input data.
Unfortunately, producing a compact-enough spiral that does not waste lots of empty space in the image it produces is a very computationally expensive process for barcodes longer than about 20 bits, but it was a fun an interesting experiment and a good practice at writing a well-documented C API with callbacks and error-handling.
github.com/saxbophone/unmoving unmoving is a C++20 baremetal library providing more convenient support for fixed-point arithmetic as used on the PlayStation.
Getting a cutting-edge version of G++ to cross-compile for the PlayStation and programming within the constraints of bare metal was a fun challenge!

Other fun stuff

github.com/saxbophone/wondercard Emulating the communications protocol used for PS1 memory cards in software
github.com/saxbophone/tr-sort Experimental sorting algorithm which attempts to calculate the rough position each element should be
github.com/saxbophone/colour-distance Web app for finding colours that are "n distance away from" a given colour, intended for interior design
github.com/saxbophone/triangberg Just for fun, animated geometrically-constructed fractal-like arrangements of triangles
github.com/saxbophone/zench C++ Z-machine interpreter, work in progress
github.com/saxbophone/galley Galois Field arithmetic using compile-time-generated lookup tables
github.com/saxbophone/dengr Partial reverse-engineering of the low-level data encoding of Compact Discs
github.com/saxbophone/lzw-bit Bit-by-bit LZW compression with redundant-code-elimination

basest-python's People

Contributors

saxbophone avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

basest-python's Issues

Encoder/Decoder corruption for some larger output bases

Encountered an issue decoding symbols that were encoded from base 128 to base 255.
I have a hunch that this is because the ratios are not exact and the output base is larger than the input base.

Currently, for all cases when decoding, empty padding symbols are converted to MAX just before decoding, like in base-85. I think this might only work when the input base is larger than the output base, so a different approach may be needed for when the output base is larger.

Code for Encoder class:

from basest.encoders import Encoder


class StrictAsciiSquashEncoder(Encoder):
    input_base = 128
    output_base = 255
    input_ratio = 9
    output_ratio = 8
    # The Strict ASCII Set
    input_symbol_table = [
        s for s in
        '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'
        '\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f'
        ' !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_'
        '`abcdefghijklmnopqrstuvwxyz{|}~\x7f'
    ]
    # Bytes 0 to 254
    output_symbol_table = [
        s for s in
        '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'
        '\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f'
        ' !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_'
        '`abcdefghijklmnopqrstuvwxyz{|}~\x7f'
        '\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f'
        '\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f'
        '\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf'
        '\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf'
        '\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf'
        '\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf'
        '\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef'
        '\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe'
    ]
    padding_symbol = '\xff'

Sample decoding errors:

>>> sa = StrictAsciiSquashEncoder()
>>> 
>>> ''.join(sa.encode('slartybartfast'))
'z_\x92d$\xceW\xce\xf6\x11t\x0b\xff\xff\xff'
>>> sa.decode(''.join(sa.encode([s for s in 'slartybartfast'])))
['s', 'l', 'a', 'r', 't', 'y', 'b', 'a', 'r', 't', 'f', 'a', 's', 'v']

Create custom error classes

Custom error classes should be created for the library, which inherit from the Python error classes.

This will make testing easier and prove that exceptions are being raised by code of mine that specifically checks for error conditions and not because of default Python error-handling.

Class-based Encoder system

Create a system where an Encoder base class can be inherited from and some attributes set which describe a custom encoding algorithm.

This could then be used to create example encoders/decoders for existing binary-to-text encoding systems such as base64, ascii85 etc...

Split up unit test files

Some of these have become very large, some contain more than one class.

These should be split out into separate files and where common parts are needed, these common parts should be refactored out to separate files for inclusion where needed.

Revise the class-based interface

This needs tidying up, there's two or three routes I can go down:

  • Remove the need to instantiate an encoder class by making the methods @classmethod
  • Change the paradigm to having the encoding settings set at object construction decided against
  • Use class inheritance far more to allow customisation at the inheritance level, using mixin classes e.g. there might be a StreamEncoder base class and a MappingEncoder mixin class. This would probably require deprecating the functional interface entirely.

Generator-based Streaming Encoder and Decoder interfaces

These would most likely replace the core encoder and decoder functions and would function in some way that would allow partial output of an encoded or decoded stream once it has received enough input.

E.g. say an encoding ratio of 4 to 5 symbols is being used, then the encoding generator would output two symbols after receiving two input symbols, as two symbols' input from a ratio of 4 is enough to calculate the values of two symbols' output for a ratio of 5.

This would have a potential speed efficiency improvement for encoding and decoding streams of data, and the other more traditional encode() and decode() functions could piggy-back on its functionality and just return the result as a list when done.

Add validation of encoding/decoding options

Validation should be added to the functions that take options defining the parameters for a given custom encoding system, to ensure that they are given a sane configuration. This matters because:

  • It's not possible to encode from a smaller base to a larger one with padding (i.e. if given input data that is not the same length as the input window). This will corrupt the data and prevent the same from being retrieved verbatim.
  • I should check that arguments are evaluated for typing.

Decoder function

Create a decoder function, behaving directly opposite to basest.encode

Setup Travis CI Builds

Use the Makefile for build commands.
Builds to be tested on the following CPython versions:

  • 2.7
  • 3.3
  • 3.4
  • 3.5

Add version of best_ratio() that searches based on range of output chunk sizes

Currently, basest.core.best_ratio() only allows a range of input chunk sizes to be given. This is inconvenient if the constraints are on the actual size of output chunk (say we want to know how many base-N symbols we can fit in 1KiB of space for instance).

Thus, an optional feature should be added allowing the output chunk size range to be specified instead.

It might also be possible to supply constraints for both input and output chunk sizes, but not sure how feasible this is.

Fix setup.py so the package can be published to PyPi

It turns out that PyPi has changed the package upload process by quite a lot since I last started writing this project. My setup.py script now doesn't work at all!

There is a guide available on the PyPi project for how to migrate older projects, so I should read this and apply which parts of that are relevant to my own package-publishing process.

Note: should use test PyPi to check this is working properly.

Add more stress-tests

  • test partial input window with larger input base
  • test complete input window with smaller input base

Stress-tests

Write some more comprehensive stress-tests which check that the encoder and decoder functions successfully handle many different unusual output bases, including with partial input to check that padding works successfully across any output base.

Publish Package to PyPi

Prerequisites:

  • Choose a software license - Chosen - Mozilla Public License v2.0
  • Check the package works on PyPi - Use test PyPi
  • Build passing on all target platforms Platforms to test: 2.7, 3.3, 3.4, 3.5

Add sample encoders

Perhaps these could be included in an examples module.

Well-known encoders I'd like to produce examples for:

  • Base64
  • Base64, URL-safe variant
  • Ascii85
  • Base85 (revised version of Ascii85 conforming to RFC 1924)
  • Z85 (ZeroMQ version of Ascii85)
  • Base91
  • Base32
  • Base16 (maybe too easy?)
  • Base36
  • Base58

With the exception of the base-85 schemes (which perform some additional kind of run-length encoding on certain output patterns, all of these should be rather trivial to implement as subclasses, and might serve as more helpful documentation and proof-of-concept to potential users.

Regarding the output transformation of the Ascii85 variants, it might be worth holding off until issue #27 is done (the pre-processing and post-processing ideas would be very helpful for such a scheme as Ascii85).

Raw encode and decode functions

These will accept and output numbers only, rather than symbols. The ordinary encode() and decode() functions would then change to be just wrappers around these, and convert to and from the different symbol sets.

Cleanup files after packaging fixes

  • MANIFEST.in can I think be removed from source control, as it appears to be auto-generated.
  • Change the project description to that currently used as the Github repo description: Arbitrary base binary-to-text (or anything to anything) encoder.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.