Coder Social home page Coder Social logo

status-im / nim-toml-serialization Goto Github PK

View Code? Open in Web Editor NEW
34.0 20.0 8.0 355 KB

Flexible TOML serialization [not] relying on run-time type information.

License: Apache License 2.0

Nim 99.62% JavaScript 0.38%
toml nim serialization parser serializer configuration-file

nim-toml-serialization's Introduction

nim-toml-serialization

License: MIT License: Apache TOML Stability: experimental nimble Github action

Flexible TOML serialization [not] relying on run-time type information.

Table of Contents

Overview

nim-toml-serialization is a member of nim-serialization family and provides several operation modes:

  • Decode into Nim data types without any intermediate steps using only a subset of TOML.
    • Unlike typical lexer-based parser, nim-toml-serialization is very efficient because the parser converts text directly into Nim data types and uses no intermediate token.
  • Decode into Nim data types mixed with TomlValueRef to parse any valid TOML value.
    • Using TomlValueRef can offer more flexibility but also require more memory. If you can avoid using a dotted key, there is no reason to use TomlValueRef.
  • Decode into TomlValueRef from any valid TOML.
  • Encode Nim data types into a subset of TOML.
  • Encode TomlValueRef into full spec TOML.
  • Both encoder and decoder support keyed mode.
  • Allow skipping unknown fields using the TomlUnknownFields flag.
    • Skipping unknown fields is also done efficiently, with no token produced. But skipped fields should contain valid TOML values or the parser will raise an exception.
  • Since v0.2.1 you can choose to use OrderedTable instead of Table when parsing into TomlValueRef using -d:tomlOrderedTable compile time switch.
  • Since v0.2.3, compile time decode/loadFile is allowed. It means you can initialize a const value using decode or loadFile. It is also ok to use it inside a static block or other nim VM code.

Note
On Windows, you might need to increase the stack size as nim-toml-serialization uses the stack to pass the object around. Example: add --passL:"-Wl,--stack,8388608" to your command line when running the compiler. But you only need to do this if the object you serializing can produce deep recursion.

Spec compliance

nim-toml-serialization implements v1.0.0 TOML spec and pass these test suites:

Nonstandard features

  • TOML key comparison according to the spec is case sensitive and this is the default mode for both encoder/decoder. But nim-toml-serialization also supports:

    • Case insensitive key comparison.
    • Nim ident sensitivity key comparison mode (only the first char is case sensitive).

    TOML key supports Unicode chars but the comparison mentioned above only applies to ASCII chars.

  • TOML inline table disallows newline inside the table. nim-toml-serialization provides a switch to enable newline in an inline table via TomlInlineTableNewline.

  • TOML standard does not support xHH escape sequence, only uHHHH or UHHHHHHHH. Use TomlHexEscape to enable this feature otherwise it will raise an exception.

  • TOML standard requires time in HH:MM:SS format, TomlHourMinute flags will allow HH:MM format.

  • TOML standard requires array elements be separated by a comma. Whitespaces are ignored. But due to a bug, the array/inline table elements can be separated by both comma and whitespace. Set TomlStrictComma flag on to parse in strict mode, by default the strict mode is off.

Keyed mode

When decoding, only objects, tuples or TomlValueRef are allowed at top level. All other Nim basic datatypes such as floats, ints, arrays, and booleans must be a value of a key.

nim-toml-serialization offers keyed mode decoding to overcome this limitation. The parser can skip any non-matching key-value pair efficiently because the parser produces no token but at the same time can validate the syntax correctly.

[server]
  name = "TOML Server"
  port = 8005
var x = Toml.decode(rawToml, string, "server.name")
assert x == "TOML Server"

or

var y = Toml.decode(rawToml, string, "server.name", caseSensitivity)

where caseSensitivity is one of:

  • TomlCaseSensitive
  • TomlCaseInsensitive
  • TomlCaseNim

The key must be a valid Toml basic key, quoted key, or dotted key.

Gotcha:

server = { ip = "127.0.0.1", port = 8005, name = "TOML Server" }

It may be tempting to use keyed mode for the above example like this:

var x = Toml.decode(rawToml, string, "server.name")

But it won't work because the grammar of TOML makes it very difficult to exit from the inline table parser in a clean way.

Decoder

  type
    NimServer = object
      name: string
      port: int

    MixedServer = object
      name: TomlValueRef
      port: int

    StringServer = object
      name: string
      port: string

  # decode into native Nim
  var nim_native = Toml.decode(rawtoml, NimServer)

  # decode into mixed Nim + TomlValueRef
  var nim_mixed = Toml.decode(rawtoml, MixedServer)

  # decode any value into string
  var nim_string = Toml.decode(rawtoml, StringServer)

  # decode any valid TOML
  var toml_value = Toml.decode(rawtoml, TomlValueRef)

Parse inline table with newline

# This is a nonstandard toml

server = {
  ip = "127.0.0.1",
  port = 8005,
  name = "TOML Server"
}
  # turn on newline in inline table mode
  var x = Toml.decode(rawtoml, Server, flags = {TomlInlineTableNewline})

Load and save

  var server = Toml.loadFile("filename.toml", Server)
  var ip = Toml.loadFile("filename.toml", string, "server.ip")

  Toml.saveFile("filename.toml", server)
  Toml.saveFile("filename.toml", ip, "server.ip")
  Toml.saveFile("filename.toml", server, flags = {TomlInlineTableNewline})

TOML we can['t] do

  • Date Time. TOML date time format is described in RFC 3339. When parsing TOML date time, use string, TomlDateTime, or TomlValueRef.

  • Date. You can parse TOML date using string, TomlDate, TomlDateTime, or TomlValueRef.

  • Time. You can parse TOML time using string, TomlTime, TomlDateTime, or TomlValueRef.

  • Heterogenous array. When parsing a heterogenous array, use string or TomlValueRef.

  • Floats. Floats should be implemented as IEEE 754 binary64 values. The standard TOML float is float64. When parsing floats, use string or TomlValueRef or SomeFloat.

  • Integers. TOML integer is a 64-bit (signed long) range expected (โˆ’9,223,372,036,854,775,808 to 9,223,372,036,854,775,807). When parsing integers, use string or SomeInteger, or TomlValueRef.

  • Array of tables. An array of tables can be parsed via TomlValueRef or parsed as a field of object. Parsing with keyed mode also works.

  • Dotted key. When parsing into a nim object, the key must not be dotted. The dotted key is supported via keyed decoding or TomlValueRef.

Option[T]

Option[T] works as usual.

Bignum

TOML integer maxed at int64. But nim-toml-serialization can extend this to arbitrary precision bignum. Parsing bignum is achieved via the helper function parseNumber.

# This is an example of how to parse bignum with `parseNumber` and `stint`.

import stint, toml_serialization

proc readValue*(r: var TomlReader, value: var Uint256) =
  try:
    var z: string
    let (sign, base) = r.parseNumber(z)

    if sign == Sign.Neg:
      raiseTomlErr(r.lex, errNegateUint)

    case base
    of base10: value = parse(z, Uint256, 10)
    of base16: value = parse(z, Uint256, 16)
    of base8:  value = parse(z, Uint256, 8)
    of base2:  value = parse(z, Uint256, 2)
  except ValueError as ex:
    raiseUnexpectedValue(r.lex, ex.msg)

var z = Toml.decode("bignum = 1234567890_1234567890", Uint256, "bignum")
assert $z == "12345678901234567890"

Table

Decoding a table can be achieved via the parseTable template. To parse the value, you can use one of the helper functions or use readValue.

The table can be used to parse the top-level value, regular table, and inline table like an object.

No built-in readValue for the table provided, you must overload it yourself depending on your need.

Table can be stdlib table, ordered table, table ref, or any table-like data type.

proc readValue*(r: var TomlReader, table: var Table[string, int]) =
  parseTable(r, key):
    table[key] = r.parseInt(int)

Sets and list-like

Similar to Table, sets and list or array-like data structure can be parsed using parseList template. It comes in two flavors, indexed and non-indexed.

Built-in readValue for regular seq and array is implemented for you. No built-in readValue for set or set-like is provided, you must overload it yourself depending on your need.

type
  HoldArray = object
    data: array[3, int]

  HoldSeq = object
    data: seq[int]

  WelderFlag = enum
    TIG
    MIG
    MMA

  Welder = object
    flags: set[WelderFlag]

proc readValue*(r: var TomlReader, value: var HoldArray) =
  # parseList with index, `i` can be any valid identifier
  r.parseList(i):
    value.data[i] = r.parseInt(int)

proc readValue*(r: var TomlReader, value: var HoldSeq) =
  # parseList without index
  r.parseList:
    let lastPos = value.data.len
    value.data.setLen(lastPos + 1)
    readValue(r, value.data[lastPos])

proc readValue*(r: var TomlReader, value: var Welder) =
  # populating set also okay
  r.parseList:
    value.flags.incl r.parseEnum(WelderFlag)

Enums

There are no enums in TOML specification. The reader/decoder can parse both the ordinal or string representation of an enum. While on the other hand, the writer/encoder only has an ordinal built-in writer. But that is not a limitation, you can always overload the writeValue to produce whatever representation of the enum you need.

The ordinal representation of an enum is TOML integer. The string representation is TOML basic string or literal string. Both multi-line basic strings(e.g. """TOML""") and multi-line literal strings(e.g. '''TOML''') are not allowed for enum value.

# fruits.toml
fruit1 = "Apple"   # basic string
fruit2 = 1         # ordinal value
fruit3 = 'Orange'  # literal string
type
  Fruits = enum
    Apple
    Banana
    Orange

  FruitBasket = object
    fruit1: Fruits
    fruit2: Fruits
    fruit3: Fruits

var x = Toml.loadFile("fruits.toml", FruitBasket)
assert x.fruit1 == Apple
assert x.fruit2 == Banana
assert x.fruit3 == Orange

# write enum output as a string
proc writeValue*(w: var TomlWriter, val: Fruits) =
  w.writeValue $val

let z = FruitBasket(fruit1: Apple, fruit2: Banana, fruit3: Orange)
let res = Toml.encode(z)
assert res == "fruit1 = \"Apple\"\nfruit2 = \"Banana\"\nfruit3 = \"Orange\"\n"

You can control the reader behavior when deserializing specific enum using configureTomlDeserialization.

``Nim configureTomlDeserialization( T: type[enum], allowNumericRepr: static[bool] = false, stringNormalizer: static[proc(s: string): string] = strictNormalize)


## Helper functions
  - `parseNumber(r: var TomlReader, value: var string): (Sign, NumberBase)`
  - `parseDateTime(r: var TomlReader): TomlDateTime`
  - `parseString(r: var TomlReader, value: var string): (bool, bool)`
  - `parseAsString(r: var TomlReader): string`
  - `parseFloat(r: var TomlReader, value: var string): Sign`
  - `parseTime(r: var TomlReader): TomlTime`
  - `parseDate(r: var TomlReader): TomlDate`
  - `parseValue(r: var TomlReader): TomlValueRef`
  - `parseEnum(r: var TomlReader, T: type enum): T`
  - `parseInt(r: var TomlReader, T: type SomeInteger): T`

`parseAsString` can parse any valid TOML value into a Nim string including a mixed array or inline table.

`parseString` returns a tuple:
  - field 0:
    - false: is a single line string.
    - true: is a multi-line string.
  - field 1:
    - false: is a basic string.
    - true: is a literal string.

`Sign` can be one of:
  - `Sign.None`
  - `Sign.Pos`
  - `Sign.Neg`

## Implementation specifics
TomlTime contains a subsecond field. The spec says the precision is implementation-specific.

In nim-toml-serialization the default is 6 digits precision.
Longer precision will be truncated by the parser.

You can override this using compiler switch `-d:tomlSubsecondPrecision=numDigits`.

## Installation

You can install the development version of the library through Nimble with the following command

nimble install https://github.com/status-im/nim-toml-serialization@#master


or install the latest release version

nimble install toml_serialization


## License

Licensed and distributed under either of

* MIT license: [LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT

or

* Apache License, Version 2.0, ([LICENSE-APACHEv2](LICENSE-APACHEv2) or http://www.apache.org/licenses/LICENSE-2.0)

at your option. This file may not be copied, modified, or distributed except according to those terms.

## Credits

A portion of the toml decoder was taken from PMunch's [`parsetoml`](https://github.com/NimParsers/parsetoml)

nim-toml-serialization's People

Contributors

etan-status avatar jangko avatar kdeme avatar narimiran avatar pietroppeter avatar tersec avatar yyoncho avatar zah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nim-toml-serialization's Issues

Garbage reads with `readValue` for a custom type

Not sure if I'm doing something wrong, but with the following code I get some garbage in the title field. Am I using readValue wrong? Surely we read from somewhere we shouldn't.

import strutils
import toml_serialization

type
  Item* = object
    id*: string
    txt*: string
  Items* = seq[Item]
  Group* = object
    title*: string
    items*: Items

const toml = """title = "GroupTitle"
items = """ & "\"\"\"" & "\nX;test1\nY;test2\nZ;test3" & "\"\"\""

proc parseItems(s: string): Items =
  for l in s.splitLines():
    if l != "":
      let parts = l.split(';')
      result.add Item(id: parts[0], txt: parts[1])

proc readValue*(r: var TomlReader, items: var Items)=
  let s = parseAsString(r)
  items = parseItems(s)

let group = Toml.decode(toml, Group)
echo group.title

For this data group.title is always Y test1 + a few garbage chars. In the real project I get Illegal storage access.

Improvement: better pretty printing for nested structures

The library makes a good attempt at pretty printing which works well for structures having only 2 levels of nesting. But for more deeply nested structure the current pretty printing algorithm is lacking.

Consider the following deeply nested structure:

[grid]
  colors = {fg = {r = 0.01568627543747425, g = 0.0117647061124444, b = 0.007843137718737125, a = 1.0}, bg = {r = 0.03921568766236305, g = 0.0470588244497776, b = 0.1607843190431595, a = 0.0}}
  grid = 3
  x = 42.42
  flag = true

This isn't easy to read or manually modify. I think the default pretty printing behaviour should be as shown below:

[grid]
  colors = {
    fg = {
      r = 0.01568627543747425,
      g = 0.0117647061124444,
      b = 0.007843137718737125,
      a = 1.0
    },
    bg = {
      r = 0.03921568766236305,
      g = 0.0470588244497776,
      b = 0.1607843190431595,
      a = 0.0
    }
  }
  grid = 3
  x = 42.42
  flag = true

We could get then a bit fancy and say specify a max line length as an encoding parameter, and then leaf-level objects would still be condensed into a single line if that would not exceed the max line length.

I think this is an important issue because one of the major selling points of TOML is readability/easy modifyability by humans.

Refactor list

  • remove lex.push and use more inputStream.peek.
  • replace toHex with no alloc version.
  • replace intToStr with no alloc version.
  • replace $ with no alloc version.
  • replace TomlTime.subsecond parser with no alloc version.
  • replace key normalization with no alloc version.

missing pieces

  • keyed mode for Toml.encode
  • keyed mode for Toml.saveFile.
  • query TomlValueRef using path e.g: value[path, type].
  • misc TomlValueRef operations.
  • explore the possibility to parse array of table directly into Nim data types.
  • encode array of table if we can parse it.
  • keyed mode for table array.
  • should we support stdlib table or provide helper proc? -- via helper proc

Library errors with devel after stricteffects enabled

compiling the first readme example with current devel throws an error.

example file:

import toml_serialization

let rawToml = """
[server]
  name = "TOML Server"
  port = 8005
"""
var x = Toml.decode(rawToml, string, "server.name")
assert x == "TOML Server"

error reported:

C:\Users\ppeterlongo\.nimble\pkgs\toml_serialization-0.2.3\toml_serialization\types.nim(172, 25) template/generic instantiation of `==` from here
C:\Users\ppeterlongo\.choosenim\toolchains\nim-#devel\lib\system\comparisons.nim(309, 6) Error: '==' can have side effects
sideEffect` '=='
>> C:\Users\ppeterlongo\.nimble\pkgs\toml_serialization-0.2.3\toml_serialization\types.nim(144, 6) Hint: '==' called by '=='

see also nim-lang/Nim#20697

it seems adding a noSideEffects annotation to == should fix this. I will check if that is indeed the case and come back with a PR.

`nimble install` doesn't install `faststreams`

Hello,

I am trying out this package to convert TOML files to Nim objects and I was successful after fixing a minor installation glitch.

In order to run Toml.decode, I need to do

nimble install faststreams
nimble install toml_serialization

If I don't install faststreams, I get this error:

/home/kmodi/.nimble/pkgs2/serialization-0.1.0-d94373efee43cbd15b1cc73185c0e49528598b37/serialization.nim(3, 33) Error: cannot open file: faststreams/inputs
make[1]: *** [nim] Error 1

Nim version is installed from devel:

Nim Compiler Version 1.9.3 [Linux: amd64]
Compiled at 2023-04-10
Copyright (c) 2006-2023 by Andreas Rumpf

git hash: 4d683fc689e124cfb0ba3ddd6e68d3e3e9b9b343
active boot switches: -d:release

Suggested fix

Can nimble install toml_serialization install faststreams as a dependency?

Improvement: handling of enum types

It would be nice to have an option to encode/decode enum types with their displayable name instead of their ordinal value. The ordinal value is not useful at all for readability and manually changing the configs in most cases.

I think this is an important issue because one of the major selling points of TOML is readability/easy modifyability by humans.

Happy to give the implementation of this a go if you guys think it's a good idea.

parsing a table

Hi
I am currently using your library for a project, but i am having some trouble understanding how to create a method for parsing some values as a Table.

Is it possible you can help me understand how to do it ?

My setup is as follows:

The toml

[database]
applicationName = "name"
multiSubnetFailover = true

[database.connectionStrings.debug]
server = "localhost"
userId = "user
password = "pass"

The nim objects

type ConfigDatabaseConnectionString* = object
  server*: string
  userId*: string
  password*: string

type ConfigDatabase* = object
  applicationName*: string
  multiSubnetFailover*: bool
  connectionStrings*: Table[string, ConfigDatabaseConnectionString]

I have tried having a combination of the following proc but they all result in an error.

proc readValue*(r: var TomlReader, item: var ConfigDatabaseConnectionString) =
  parseTable(r, key):
    item.server = r.parseString(string, "server")
    item.userId = r.parseString(string, "userId")
    item.password = r.parseString(string, "password")

proc readValue*(r: var TomlReader, table: var Table[string, ConfigDatabaseConnectionString]) =
  parseTable(r, key):
    table[key].server = r.parseString(string, "server")
    table[key].userId = r.parseString(string, "userId")
    table[key].password = r.parseString(string, "password")

proc readValue*(r: var TomlReader, table: var Table[string, ConfigDatabaseConnectionString]) =
  parseTable(r, key):
    table[key].server = r.parseString(string)
    table[key].userId = r.parseString(string)
    table[key].password = r.parseString(string)

However if I copy over the example from https://nimble.directory/pkg/tomlserialization then it does not fail for that one.

So clearly I simply don't understand how to write the method.

Investigate the feasibility of adding flavor-like mechanism

As the usage of toml-serialization is growing beyond a mere config file parser, there is potential serializer clash between different subsystem using toml-serialization.

A lesson learned from json-serialization library, flavor-like mechanism is a good candidate to prevent such clash or contamination.

The granularity of the flavor should be configurable by user.
Backward compatibility is mandatory. toml-serialization stability, flexibility, and it's simplicity draw more users.

hint [XCannotRaiseY] when using the library

I am using this library in this project: https://github.com/pietroppeter/nimib

I have seen that when it is used it raises multiple times the following hints [XCannotRaiseY]

It is not a big issue for a user (we could always silence this specific hint, but I thought you might want to know (maybe you already do).

/home/runner/.nimble/pkgs/toml_serialization-0.2.3/toml_serialization/reader.nim(416, 59) Hint: 'readValue' cannot raise 'ValueError' [XCannotRaiseY]
/home/runner/.nimble/pkgs/toml_serialization-0.2.3/toml_serialization/reader.nim(416, 71) Hint: 'readValue' cannot raise 'Defect' [XCannotRaiseY]
/home/runner/.nimble/pkgs/serialization-0.1.0/serialization/object_serialization.nim(210, 67) Hint: 'readField' cannot raise 'Defect' [XCannotRaiseY]
/home/runner/.nimble/pkgs/serialization-0.1.0/serialization/object_serialization.nim(210, 67) Hint: 'readField' cannot raise 'Defect' [XCannotRaiseY]
/home/runner/.nimble/pkgs/toml_serialization-0.2.3/toml_serialization/reader.nim(416, 59) Hint: 'readValue' cannot raise 'ValueError' [XCannotRaiseY]
/home/runner/.nimble/pkgs/toml_serialization-0.2.3/toml_serialization/reader.nim(416, 71) Hint: 'readValue' cannot raise 'Defect' [XCannotRaiseY]

(fyi the above log is taken from CI of nimib project, latest commit)

Segfault on `orc` when echo'ing decoded object

When decoding an object on orc it segfaults when echoing the object. It works for refc but not on orc (tested on 1.6.10 and devel). Reproducing code:

import toml_serialization

type NbConfig* = object
  srcDir*, homeDir*: string

let f = readFile("nimib.toml")

let t = Toml.decode(f,  NbConfig, "nimib")

echo t # error here

nimib.toml:

[nimib]
srcDir = "docsrc"
homeDir = "docs"

Output:

Traceback (most recent call last)
/home/hugo/code/nim/nimib/x_toml.nim(10) x_toml
/home/hugo/.choosenim/toolchains/nim-#devel/lib/std/private/miscdollars.nim(25) $
/home/hugo/.choosenim/toolchains/nim-#devel/lib/system/alloc.nim(1043) alloc
/home/hugo/.choosenim/toolchains/nim-#devel/lib/system/alloc.nim(859) rawAlloc
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Segmentation fault (core dumped)

Expected output:

(srcDir: "docsrc", homeDir: "docs")

OrderedTable support

I'm using this library to parse a TOML file where I describe some tables that will later become graphical elements, it will be cool if there were some TomlKind.OrderedTable or TomlFlag.TomlOrderedTable so the elements look in the order you defined them.

Compile-time Support

Right now if you try to load (or decode) a TOML file at compile time (using the {.compileTime} pragma or const) you get an error.
I am using TOML for a configuration file and it's important that I can read it at compile time. Not sure if this is possible or if there is a workaround.
Thanks

import toml_serialization

let data {.compileTime.} = Toml.loadFile("data.toml", TomlValueRef)
/home/me/.nimble/pkgs/faststreams-0.3.0/faststreams/inputs.nim(247, 26) Error: cannot evaluate at compile time: memFileInputVTable

Version 0.2.1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.