Coder Social home page Coder Social logo

Curiosity about binarycodable HOT 3 OPEN

jverkoey avatar jverkoey commented on August 28, 2024
Curiosity

from binarycodable.

Comments (3)

jverkoey avatar jverkoey commented on August 28, 2024 2

Solid question, and one I'll answer here but likely need to expand upon in the repo's docs. Here's the sequence of events that led me here:

When I first built https://github.com/jverkoey/MySqlConnector/ I used a naive binary decoding implementation that used iterators of [UInt8]/Data to consume the data from a socket. See an example of this here. Snippet included below:

public struct LengthEncodedString {
  public init?(data: Data, encoding: String.Encoding) throws {
    // Empty data is not a length-encoded string.
    if data.isEmpty {
      return nil
    }

    let integer: LengthEncodedInteger
    do {
      guard let integerOrNil = try LengthEncodedInteger(data: data) else {
        return nil
      }
      integer = integerOrNil
    } catch let error {
      if let lengthEncodedError = error as? LengthEncodedIntegerDecodingError {
        switch lengthEncodedError {
        case .unexpectedEndOfData(let expectedAtLeast):
          throw LengthEncodedStringDecodingError.unexpectedEndOfData(expectedAtLeast: expectedAtLeast)
        }
      }
      throw error
    }

    self.length = UInt64(integer.length) + UInt64(integer.value)

    let remainingData = data[integer.length..<(integer.length + UInt(integer.value))]
    if remainingData.count < integer.value {
      throw LengthEncodedStringDecodingError.unexpectedEndOfData(expectedAtLeast: UInt(integer.value))
    }

    guard let string = String(data: remainingData, encoding: encoding) else {
      throw LengthEncodedStringDecodingError.unableToCreateStringWithEncoding(encoding)
    }
    self.value = string
  }
}

This implementation was fast and effective, but managing type conversions, bounds checking, and making data iterators became a fairly repetitive pattern. Swift Codable came to mind as a possible improvement, so I began exploring it in jverkoey/MySqlClient#23. You can see a proof of concept in the first commit of that PR.

In essence, I moved the data iterator into a custom Decoder implementation and updated my payloads to conform to Decodable:

public struct LengthEncodedString: Codable {
  public init(from decoder: Decoder) throws {
    var container = try decoder.unkeyedContainer()

    let length = try container.decode(LengthEncodedInteger.self)
    self.length = UInt64(length.length) + UInt64(length.value)

    let stringData = try (0..<length.value).map { _ in try container.decode(UInt8.self) }
    self.value = String(data: Data(stringData), encoding: .utf8)!
  }

Quite a bit simpler now, but in doing so I encountered a few concerns about Codable's applicability to binary data solutions, which I've outlined below.

Swift Codable assumes complex external representations are dictionaries

One of the main benefits of Swift Codable is that you can get encoding and decoding of complex types for free. These for-free implementations rely on CodingKeys that must exist in some manner in the external representation. Binary data unfortunately does not always have a concept of a named key; at least not without completely parsing the data representation which defeats the purpose of the Codable interface.

While Swift's default behavior can be hacked to our benefit by assuming that each property will be decoded in the order in which it was defined — Mike Ash took this approach — I prefer clearly debuggable code when working with binary formats. There are also enough quirks with binary formats that the assumption of Decodable primitives mapping to binary primitives can fall over pretty quickly (length-encoded strings being a good example).

Aside: I do think there is potential in BinaryCodable to provide some for-free implementations of complex types; my thoughts are outlined here: #4.

So in practice, binary representations written with Codable will almost always have to provide an explicit implementation anyway in order to "opt out" of the keyed external representation assumption. This wasn't a deal-breaker, it just meant binary representations wouldn't benefit from Codable's code generation for complex types (somewhat reducing the value of Codable).

Swift Codable's primitives do not give access to underlying data

This is what ended up being the deal-breaker for me. Let's look again at that length-encoded string implementation using Codable:

public struct LengthEncodedString: Codable {
  public init(from decoder: Decoder) throws {
    var container = try decoder.unkeyedContainer()

    let length = try container.decode(LengthEncodedInteger.self)
    self.length = UInt64(length.length) + UInt64(length.value)

    let stringData = try (0..<length.value).map { _ in try container.decode(UInt8.self) }
    self.value = String(data: Data(stringData), encoding: .utf8)!
  }

Particularly this line:

let stringData = try (0..<length.value).map { _ in try container.decode(UInt8.self) }

Swift Codable does not have a primitive of "arbitrary bytes of data", so we're forced to channel all byte encoding/decoding one UInt8 at a time. We could encode/decode one UInt64 at a time, but the implementation then needs to handle lengths that are not multiples of 8 gracefully. Either way, this is a substantial cpu bottleneck for larger blocks of data.

Without a healthy way to work with arbitrary blocks of data, Codable's value dipped from "reasonable, given we don't get free code generation" to "negative, given there is now a significant performance penalty".

Swift Codable does not encourage correctness by default for binary representations

This is a minor point, but one I feel is worth mentioning because on the average I feel Swift is a wonderful language directly because it encourages correctness.

Swift Codable has three container types: keyed, unkeyed, and singleValue. Binary data does not necessarily benefit from these three layers of abstraction, so in practice all of my binary types were using unkeyed containers to hack the external representation as an array of bytes (using the UInt8 primitive). As such, unkeyed containers are in essence the only "correct" container in Codable for complex binary data, so the availability of incorrect containers was a source of tension for me as I was implementing more complex types.

BinaryCodable's solutions to the above concerns

BinaryCodable takes inspiration from Swift Codable, but makes a few distinct architectural decisions that optimize it for working with binary data:

  1. BinaryCodable is essentially a type-safe, Codable-like equivalent to the C family of file operators, with only fread-and fwrite-like behavior implemented thus far. I may add fseek-like behavior in the future as needed.
  2. Only one container type is provided. This encourages correctness.
  3. There are APIs for encoding and decoding arbitrary blocks of data.
  4. There are APIs for encoding and decoding strings, either terminated or container-bound.
  5. RawRepresentable types do get auto-generated BinaryCodable implementations for free using protocol extensions. Complex types will require some more care and thought.

And finally, this is the BinaryCodable version of the LengthEncodedString implementation:

struct LengthEncodedString: BinaryDecodable {
  init(from decoder: BinaryDecoder) throws {
    var container = decoder.sequentialContainer(maxLength: nil)

    let length = try container.decode(LengthEncodedInteger.self)
    let stringData = try container.decode(length: Int(length.value))
    guard let string = String(data: Data(stringData), encoding: .utf8) else {
      throw BinaryDecodingError.dataCorrupted(.init(debugDescription:
        "Unable to create String representation of data"))
    }
    self.value = string
  }

from binarycodable.

jverkoey avatar jverkoey commented on August 28, 2024 2

Love it :) I’ll bring this up in the forums and perhaps take a stab at an evolution doc.

from binarycodable.

DevAndArtist avatar DevAndArtist commented on August 28, 2024

That is great feedback, thank you for that. Judging from the implementation that I had a quick glance at I still think that we could extend Codable to operate on binary data in a way the whole community would prefer to and benefit from + we get some more language features for free.

I'm pretty sure if you would bring up this discussion to the official Swift forums, together with the community we could shape a great proposal to extend that area of Swift and avoid possible bottlenecks, because if this would go into stdlib then you would have even more ways to implement certain things at your disposal since there you can have more compiler support it required to avoid performance penalties.

Such an extension will also light up some discussion about Data, because it's not part of stdlib and if we can have a superior type for working with binary data. (I tapped myself so many times with the fact that Data can be a slice of the original Data instance.)

With all that maybe we would also see more extensions of the stdlib types to provide seamless support to work with binary data. Wouldn't that be great?

That said, your module is not the first that is trying to solve these things in a similar fashion. And since all these solutions kind of overlap (partly) with Codable, maybe it's a great signal to push a general solution and establish a standard in the language itself. :)

from binarycodable.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.