Coder Social home page Coder Social logo

davinirjr / py_smartyparse Goto Github PK

View Code? Open in Web Editor NEW

This project forked from muterra/py_smartyparse

0.0 2.0 0.0 95 KB

Smartyparse: A python library for smart dynamic binary de/encoding.

License: GNU Lesser General Public License v2.1

Python 100.00%

py_smartyparse's Introduction

SmartyParse

Code Climate Issue Count Build Status

What is SmartyParse?

SmartyParse is a binary packing/unpacking (aka building/parsing) library for arbitrary formats written for python >= 3.3. If you have a defined binary format (.tar, .bmp, byte-oriented network packets, etc) or are developing one, SmartyParse is a way to convert those formats to and from Python objects. Its most direct alternative is Construct, which is admittedly much more mature.

As an explicit warning, this is a very, very new library, and you are likely to run into some bugs. Pull requests are welcome, and I apologize for the sometimes messy source.

What makes SmartyParse different?

SmartyParse, first and foremost, was built to support self-describing formats. Though it is (to an extent) possible to create these in declarative parsing libraries like Construct, it is very tedious, and requires a substantial amount of extra code.

Fundamentally that means there are three big differences between SmartyParse and Construct:

  1. SmartyParse is highly Pythonic and very intuitive. Construct requires learning a specialized Construct descriptive format.
  2. SmartyParse is imperative. Construct is declarative.
  3. SmartyParse supports running arbitrary callbacks during the parsing process.

Otherwise, Construct and SmartyParse are functionally similar (though for the record, SmartyParse doesn't yet natively support bit-oriented formats, which Construct does).

Installation

Smartyparse is currently in pre-release alpha status. It is available on pip, but you must explicitly allow prerelease versions like this:

pip install --pre smartyparse

Smartyparse has no external dependencies at this time (beyond the standard library), though building it from source will require pandoc and pypandoc:

sudo apt-get install pandoc
pip install pypandoc

Example usage

See /doc for full API documentation.

Declaring a simple length -> data object:

Offset Length Description
0 4 Int32 U, n
4 n Blob
from smartyparse import SmartyParser
from smartyparse import ParseHelper
import smartyparse.parsers

unknown_blob = SmartyParser()
unknown_blob['length'] = ParseHelper(parsers.Int32(signed=False))
unknown_blob['data'] = ParseHelper(parsers.Blob())
unknown_blob.link_length(data_name='data', length_name='length')

Nesting that to define a simple file:

Offset Length Description
0 4 Magic 'test'
4 4 Int32 U, n
8 n Blob
8 + n 4 Int32 U, m
12 + n m Blob
12 + n + m 4 Int32 U
test = SmartyParser()
test['magic'] = ParseHelper(parsers.Blob(length=4))
test['blob1'] = unknown_blob
test['blob2'] = unknown_blob
test['tail'] = ParseHelper(parsers.Int32(signed=False))

An object to pack into the above:

test_obj = {
    'magic': b'test',
    'blob1': {
        'data': b'Hello world!'
    },
    'blob2': {
        'data': b'Hello, world?'
    },
    'tail': 123
}

Why the awkward dict for the blobs? Well, because SmartyParser objects aren't usually intended for things as simple as a length <-> value pair. It would make a lot more sense if it were 'header' and 'body', wouldn't it?

Packing and recycling the above object:

>>> packed = test.pack(test_obj)
>>> test_obj_reloaded = test.unpack(packed)
>>> test_obj == test_obj_reloaded
True

Supporting SmartyParse

Smartyparse is under development as part of the Muse protocol implementation used in the Ethyr encrypted email-like messaging application. If you would like to support Smartyparse, Muse, or Ethyr, please consider contributing to our IndieGoGo campaign.

Todo

(In no particular order)

  • Change SmartyParserObject to use slots for storage, but not for item names (essentially removing attribute-style access, which isn't documented anyways)
  • Add self-describing format to example usage
  • Write .bmp library showcase
  • Move/mirror documentation to readthedocs
  • Add padding generation method (in addition to constant byte)
  • Add pip version badge: [![PyPi version](https://pypip.in/v/$REPO/badge.png)](https://github.com/Muterra/py_smartyparse) above.
  • Support bit orientation
  • Support endianness of binary blobs (aka transforming from little to big)
  • Support memoization of static SmartyParsers for extremely performant parsing
  • Support memoization of partially-static smartyparsers for better-than-completely-dynamic parsing
  • Autogeneration of integration test suite from API spec in /doc/
  • Random self-describing format declaration and testing
  • Performance testing
  • Add customized pep8 to codeclimate testing, as per (as yet unpublished) Muterra code style guide
  • Support for "end flags" for indeterminate-length lists

Done!

  • Add passing of parent SmartyParser to callback system. Added in 0.1a4 with the @references(referent) decorator.
  • Clean up callback API. Added in 0.1a4

Misc API notes

  • SmartyParser fieldnames currently must be valid identifier strings (anything you could assign as an attribute). If you want to programmatically check validity, use 'foo'.isidentifier(), but SmartyParser will raise an error if you try to assign an invalid fieldname. This is the result of using __slots__ for some memory optimization, which is a compromise between default dict behavior and memory use. If you're parsing a ton of objects, it will be very helpful for memory consumption.
  • Due to numeric imprecision, floats and doubles can potentially break equivalence (ie start == reloaded) when comparing the before and after of packing -> unpacking the same object.

py_smartyparse's People

Contributors

badg avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.