Coder Social home page Coder Social logo

parsingdata / metal Goto Github PK

View Code? Open in Web Editor NEW
16.0 5.0 9.0 4.03 MB

A Java library for parsing binary data formats, using declarative descriptions.

License: Apache License 2.0

Java 100.00%
metal java-library parsing parser-combinators binary-data java data-parsing parser parser-library

metal's Introduction

Metal

A Java library for parsing binary data formats, using declarative descriptions.

GH Actions Metal build Build status codecov.io CodeFactor Codacy Badge SonarCloud

Using Metal

Metal releases are available in the central Maven repository. To use the latest (10.0.0) release of Metal, include the following section in the pom.xml under dependencies:

<dependency>
  <groupId>io.parsingdata</groupId>
  <artifactId>metal-core</artifactId>
  <version>10.0.0</version>
</dependency>

In addition, snapshots are published to GitHub Packages. In order to use those, add the following section in the pom.xml under repositories:

<repository>
  <id>github-metal-snapshots</id>
  <url>https://maven.pkg.github.com/parsingdata/metal</url>
  <snapshots>
    <enabled>true</enabled>
  </snapshots>
</repository>

Please read the Authenticating to GitHub Packages documentation to learn how give Maven access to the repository.

License

Copyright 2013-2024 Netherlands Forensic Institute Copyright 2021-2024 Infix Technologies B.V.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

metal's People

Contributors

akaidiot avatar ccreeten avatar dependabot-preview[bot] avatar dependabot-support avatar dependabot[bot] avatar jvdb avatar mvanaken avatar rdvdijk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

metal's Issues

Clean up creating of ParseResult instances

The Token implementations are now littered with calls to the ParseResult constructor.

Let's investigate if we can clean this up by providing static factory methods instead.

Add Len ValueExpression

Len(String name) which refers (just like its Ref() counterparts) to a (Parse)Value and instead of returning the value, it returns the value's length in bytes.

Clean up multimodule pom and Maven warnings

Some of the issues:

  1. Separate descriptions for the different artifacts (looks better in the generated site).
  2. Fix warnings around SCM properties and inheritance of the values it uses.
  3. Fix Javadoc warnings that were introduced in the 3.1.0 release.

Elvis ValueExpression

We cannot express the logic of the 'elvis operator' combining existing ValueExpressions.

elvis(ref("a"), ref("b"))

This expression should evaluate to ref("a") if that ValueExpression can successfully be evaluated, and evaluate to ref("b") otherwise.

Let StructSinks be attached dynamically

Right now a callback can be assigned to a token by wrapping it in a Str. Instead of this static mapping, a more dynamic approach is more useful, which will allow dynamically turning callbacks on and off and allow adding callbacks to any token at any time.

Together with #11 this will make Str obsolete.

Add the license header to sources that are lacking it

These files are missing the license header:

$ grep -IRiL LICENSE . | grep java
./core/src/main/java/io/parsingdata/metal/expression/value/reference/Len.java
./core/src/test/java/io/parsingdata/metal/data/ParseRefTest.java
./core/src/test/java/io/parsingdata/metal/data/ParseValueTest.java
./core/src/test/java/io/parsingdata/metal/UtilHexTest.java

Refactor to multimodule Maven project

Which would then have:

  • core, the current module
  • tokens, a set of tokens as examples, for practical use and testing
  • utils, common tools and conversions (e.g., #29)

ByOffset's hasGraphAtRef() and findRef() methods are inconsistent

As shown in the fix to #90, the subgraph returned by ByOffset's methods hasGraphAtRef() and findRef() are inconsistent at best. This needs a serious rethink to make sure there is both an understandable (for users) and implementable approach to cycle detection and handling.

One idea is to ignore subgraphs that are related to Sub, so as to prevent cycle prone traversal in cycle detection itself.

Add type mechanism for single value list expressions

Examples: In Def and Nod the size expression needs to evaluate to a list with a single value. Parsing will fail otherwise. RepN needs a list with a single value for its n argument.

Is there a way to enforce this using the type system?

Document all public items

  • document Tokens
  • document Shorthands
  • document ComparisonExpressions
  • document LogicalExpressions
  • document ValueExpressions
    ...

Improve coverage on Java-specific things

Test stuff such as private constructors and inherited Enum methods. Reason to do this is that it makes the actual coverage number more meaningful and will make it easier to find places where we lack coverage in the future.

Add API to retrieve a collection's base item

An example:

seq(a, b, c) will create three ParseGraph objects when successfully parsed:

  1. Containing c,
  2. Containing b with c as a child,
  3. Containing a with b as a child.

It is often desirable (e.g., in callbacks or when processing complete results) to only receive the item from 3, which contains the entire collection. An API should be created that makes this possible.

For example, ByToken.getRoot(token) or ByToken.getBase(token).

Apply code style

Naming and other issues have been resolved (see #82, #58, #59, #76, etc.), and now the only style issue left is indenting/whitespace. We've arrived at the following decision (apart from other style issues, which are essentially all standard Java things), with regard to one non-standard practice in Metal, namely lines of the following form:

conditional(...) { statement; } and
methodSignature(...) { return statement; }

are only allowed if statement is one of the following:

  1. A return statement returning a simple value (e.g., variable, field, literal), such as return 5; or return value;.
  2. A return statement doing a trivial boolean compare, such as return size != 0; or return !x;.
  3. A throw statement, such as throw new UnsupportedOperationException();

In all other cases, the lines should be broken up according to standard formatting rules, such as:

conditional() {
    statement;
}

Let ValueExpression.eval() return a collection of Values instead of a single Value

This will make it easier to generalize complex operations and predicates. An example is a Zip operation that operates on two collections.

Furthermore it will make the combinators more modular -- instead of having separate First, Last, etc. implementations, Ref can simply return all instances which can then be modified by generic First, Last, etc. modifiers.

Improve ParseValue toString()

Currently, the implementation is as follows:

@Override
public String toString() {
    return "ParseValue(" + getName() + ":" + super.toString() + ")";
}

And the current superclass Value toString() method is:

@Override
public String toString() {
    return getClass().getSimpleName() + "(" + DatatypeConverter.printHexBinary(_data) + ")";
}

Resulting in duplicate classnames like ParseValue(value:ParseValue(01)), where it would be more logical to just have ParseValue(value:01)".

Self-referencing tokens can only be implemented using anonymous Token classes

The following 'self-referencing page' Token does not compile:

private static final Token PAGE = 
    seq(
        def("pagePayload", 8),
        def("nextPageOffset", 4),
        sub(PAGE, ref("nextPageOffset")
    );

We cannot reference PAGE because we are defining it here.

To fix this, see the following workaround:

private static final Token PAGE = new Token(null) {
    @Override
    protected ParseResult parseImpl(final String scope, final Environment env, final Encoding enc) throws IOException {
        return seq(
                def("pagePayload", 8),
                def("nextPageOffset", 4),
                sub(this, ref("nextPageOffset"))
        ).parse(scope, env, enc);
    }
};

Note the sub(this, ...) construct. This is a trick to reference back to the anonymous PAGE Token instance.

We need a way to express such a self-referencing token.

The main challenge in implementing such a token is that Tokens are not aware of their 'parent' Token, so we can not locate it.

Check private constructor issue in ParseGraph

The only private constructor in ParseGraph is used only by two helper methods (ByItem.getGraphAfter() and Reversal.reverse(ParseGraph)) which in turn are only used in test code and not in any client code we know of. So the solution is: make the constructor private, remove the helper methods and repair the test code to not depend on them.

Use an enum for Encoding signedness instead of boolean

Currently we use a boolean to set the signedness of an Encoding:

    public Encoding(final boolean signed) {
        this(signed, DEFAULT_CHARSET, DEFAULT_BYTE_ORDER);
    }
    // ...
    public Encoding(final boolean signed, final Charset charset, final ByteOrder byteOrder) {
        _signed = signed;
        _charset = charset;
        _byteOrder = byteOrder;
    }

Using an enum similar to ByteOrder would be much cleaner, and easier on the eyes.

Proposal: Signedness.SIGNED and Signedness.UNSIGNED.

Remove Str

Remove the Str token. The Str exists for three reasons:

  1. Attach a callback to some token. Addressed by #12.
  2. Create a named scope. Addressed by #11.
  3. Allow retrieving a subgraph of some collection type (e.g. Seq or Rep) without also receiving all smaller subgraphs it contains. Addressed by #88.

Generic error callbacks

Issue #89 added callbacks for Tokens. We need a way to assign callbacks to any error, unrelated to a specific Token instance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.