tc39 / proposal-json-parse-with-source Goto Github PK

View Code? Open in Web Editor NEW

208.0 34.0 9.0 96 KB

Proposal for extending JSON.parse to expose input source text.

Home Page: https://tc39.github.io/proposal-json-parse-with-source

License: MIT License

HTML 100.00%

proposal-json-parse-with-source's Issues

Advance to stage 3

Criteria taken from the TC39 process document minus those from previous stages:

Complete spec text

https://github.com/tc39/proposal-json-parse-with-source/blob/master/spec.html
https://tc39.es/proposal-json-parse-with-source/

Designated reviewers have signed off on the current spec text

Michael Ficarra (@michaelficarra) #28
Waldemar Horwat (@waldemarhorwat) #29

The ECMAScript editor has signed off on the current spec text

Reviver can modify holder object, like inserting new object / array, which breaks invariant of parsing information

Let's have this example,

print(JSON.stringify(JSON.parse("[1, 2]", function (name, value, context) {
    if (name === "0")
        this[1] = [ "Hello" ];
    return this[name];
})));

JSON.stringify's result is [1,["Hello",["Hello"]]] before this proposal because reviver modified array[1] at runtime and we visit this newly inserted ["Hello"] array.
But for this array, we do not have correct corresponding parse information since this is not coming from text. source information for this property is still 2's information.
Thus,

i. Assert: typedValNode is an ArrayLiteral.

This will crash since this newly inserted array does not have right source information. Source information for this property says 2 (it is coming from the original source).

Are there any implementations yet?

Has this proposal been implemented anywhere yet (e.g. via a polyfill)? Thanks!

Should reviver functions still get a context argument for non-primitive values?

Source text access is restricted to primitive values per #10 (comment) , so there is currently no information to provide for non-primitive values:

JSON.parse("42", (key, val, context) => context) // => {source: "42"}
JSON.parse("{}", (key, val, context) => context) // => ???

Some possibilities:

Do not provide the argument at all.
Provide undefined.
Provide an object with no source property ({}).
Provide an object with source: undefined.

Why is `JSON.rawJSON` limited to primitives only?

Forgive me if this is the wrong spot to put this.

I think JSON.rawJSON is a really powerful API for performance-optimizing JSON serialization. But, because it is limited to only producing valid primitive JSON, it can't be used for "inline"-ing existing JSON.

I've got a couple use cases I want to use it for that requires feeding pre-serialized objects and arrays into the serialization of outer object trees. For example, in a typical REST API, you might retrieve 10 records from the database, and reply with one big JSON array of all of them. Each record might have a big JSON value on it, and if they are large, it performs poorly to de-serialize each record's JSON object to then just serialize it again to produce the REST API response holding all 10 records. Instead, it'd be great to leave the data as a string when fetching from the database, and then just insert it into the final JSON string produced by JSON.stringify using JSON.rawJSON to wrap each of these strings.

Without this capability, one has to resort to manually clobbering together JSON strings which is far less performant and correct than using the engine's built-in capabilities, or always deserializing just to serialize again. Userland implementations like json-stream-stringify are far, far slower, and at least in my case, the JSON objects are really big, so deserializing and reserializing is a major performance issue.

I presume there is a justification for limiting what can be go through a .rawJSON, but what is it? And, could there ever be a trusted mode, or some sort of escape hatch where for very performance sensitive use cases, any ole string could be sent along?

Also one other note: it seems that this low level API could really assist with performance optimization around avoiding re-serializing values you already have the source JSON string for, but as currently specified it can't because it does the safety check by parsing the string anyways. That seems correct but inefficient, again suggesting that it'd be great to have some sort of escape hatch for the brave. Notably, [[IsRawJSON]] being an internal slot means that userland can't create their own raw JSON objects and pay the complexity / reliability price.

performance impact with 1-million json.parse calls

thread is slightly off-topic as real-world-scenario is parsing-json in c#/unity rather than javascript, but might be informative.

context:
RogueTech is community-mod for rpg-videogame BATTLETECH, and infamous for taking forever to startup+load - around 4 minutes on 2018 mac-mini.

after some instrumentation-patching, found out game+mod calls JSONSerializationUtility.RehydrateObjectFromDictionary() 930,000 times from program-startup to loading saved game (game has ~4000 json-files for various rpg-stats).

the 1-million calls in c# to parse json may have contributed to long load-time, but i have no proof. but just wanted to make aware of real-world application making high number of json-parsing calls and suffering [maybe unrelated] performance issues.

Extend this proposal to include serialization?

This proposal currently covers only the parsing side, but full round-tripping would also require serialization of e.g. BigInt values as unquoted digit sequences. The committee seemed tepid about including serialization in this proposal, but I still wanted to capture the concept even if it is rejected as expected.

bigint ergonomics of user-function vs options-object

apologies for revisiting #5 (add an options object to JSON.parse and JSON.stringify)

i think this proposal's primary motivation is parsing bigint? imo options-object like { bigintIfTooBig: true } remains more ergonomic (and less error-prone):

// options-object
JSON.parse(
    "[ 1, 12345678901234567890, 3, \"foo\" ]",
    {
        bigintIfTooBig: true,
        bigintDigitLimit: 50 // security-check
    }
);
// [ 1, 12345678901234567890n, 3, "foo" ]

// user-function
JSON.parse(
    "[ 1, 12345678901234567890, 3, \"foo\" ]",
    function (key, val, {source}) {
        if (
            typeof val === number
            && (val === Number.MAX_SAFE_INTEGER || val === Number.MIN_SAFE_INTEGER)
        ) {
            // security-check
            if (source.length > 50) {
                throw new Error("bigint buffer-overflow");
            }
            return bigint(source);
        }
        return val;
    }
);
// [ 1, 12345678901234567890n, 3, "foo" ]

roundtripping is more ergonomic as well:

let data;
data = [ 1, 12345678901234567890n, 3, "foo" ];
data = JSON.stringify(
    data,
    {
        allowBigint: true
    }
);
// "[ 1, 12345678901234567890, 3, \"foo\" ]"
data = JSON.parse(
    data,
    {
        bigintIfTooBig: true,
        bigintDigitLimit: 50
    }
);
// [ 1, 12345678901234567890n, 3, "foo" ]

A polyfill/ponyfill implementation

Hello!

Thank you for this proposal and your hard work!

While waiting for the proposal to be generally available, I've implemented a custom JSON parser that is future-compatible with the proposal. It doesn't look feasible to make it a proper polyfill, considering that JS implementation will probably be slower and we can't detect new features usage, but it's a ponyfill that can be used today where needed.

I'm not sure that I've implemented all the features correctly, it's not that easy to follow the standard, but all the serialization examples should work correctly. I will be very grateful if someone could review the implementation.

Thanks!

Move repo to tc39 org

Advance to stage 4

Criteria taken from the TC39 process document minus those from previous stages:

Test262 acceptance tests have been written for mainline usage scenarios, and merged

TODO

Two compatible implementations which pass the acceptance tests

TODO: Add #implementations section to README

Significant in-the-field experience with shipping implementations, such as that provided by two independent VMs

TODO: Add #implementations section to README

Bug tickets to track:

Chrome/V8: https://bugs.chromium.org/p/v8/issues/detail?id=12955
Firefox/SpiderMonkey: https://bugzilla.mozilla.org/show_bug.cgi?id=1658310
Safari/JavaScriptCore: https://bugs.webkit.org/show_bug.cgi?id=248031
etc.

A pull request has been sent to https://github.com/tc39/ecma262 with the integrated spec text

TODO

All ECMAScript editors have signed off on the pull request

TODO

cookbook scenario to JSON.parse bigint in schema-less data

say i'm working on an agile web-project with bigint where either:

the schema changes so frequently with each iteration that the schema-based
reviver function (key, val, src, keys) {...} becomes tech-debt
or it has schema-less dictionaries with arbitrary key/val pairs.

in both cases i need an idiot-proof, schema-less JSON.parse solution that will preserve integer-precision by up-coercing integers to bigint as needed. would a cookbook solution look as follows?

require("http").createServer(async function (req, res) {
    let result;
    let reviver;

    reviver = function (ignore, val, src) {
    /*
     * this [schema-less] reviver will preserve integer-precision
     * by returning a bigint if precision-loss is detected
     *
     * reviver is not responsible for enforcing explicit, number/bigint schemas.
     * that is left to user after JSON.parse has done its job.
     */
        let bigint;
        // ignore non-number case
        if (typeof val !== "number") {
            return val;
        }
        // secure against malicious, 1000-digit numbers
        if (src.length > 1000) {
            throw new Error("encountered number with >1000 digits");
        }
        // TODO - handle bigdecimal
        if (src.indexOf(".") >= 0) {
            ...
        }
        try {
            bigint = BigInt(src);
        // ignore non-integer case
        } catch (err) {
            return val;
        }
        // integer precision-loss detected - return bigint
        if (BigInt(val) !== bigint) {
            return bigint;
        }
        // return val
        return val;
    };

    result = await ... // read body from http-request-stream
    // result = "{\
    //     \"dict\": {\
    //         \"bigdecimal\": 12345678901234567890.1234,\
    //         \"bigint\": 12345678901234567890,\
    //         \"float\": 1234.5678,\
    //         \"int\": 1234,\
    //     },\
    //     \"list\": [\
    //         12345678901234567890.1234,\
    //         12345678901234567890,\
    //         1234.5678,\
    //         1234\
    //     ]\
    // }"

    result = JSON.parse(result, reviver);
    // result = {
    //     "dict": {
    //         "bigdecimal": ???,
    //         "bigint": 12345678901234567890n,
    //         "float": 1234.5678,
    //         "int": 1234,
    //     },
    //     "list": [
    //         ???,
    //         12345678901234567890n,
    //         1234.5678,
    //         1234
    //     ]
    // }

    /*
     * reviver is not responsible for enforcing explicit, number/bigint schemas.
     * that is left to user after JSON.parse has done its job.
     */
    result = ...
}).listen(8080);

Stage 3 Specification Review: Michael Ficarra

This is a placeholder task for the Stage 3 Specification Review feedback from @michaelficarra.

Reusability of raw serialization mechanism

toJSON provides a capability available to any object for controlling its serialization, and a well-known Symbol would extend that capability to include raw output (e.g., JSON.stringify({toJSON(){ return {[Symbol.rawJSON]: longStringOfDigits}; }})).

Alternatively, the mechanism for raw output could be limited to each individual invocation of JSON.stringify or even to each individual invocation of a replacer function, either of which would have the potential benefit of coupling it to specifically-expressed author intent.

add an options object to JSON.parse and JSON.stringify

I want to revive a suggestion I made the last time I say a proposal to tweak ES JSON processing: tc39/proposal-well-formed-stringify#4 (comment)

There is nothing particularly sacred about the JSON/JavaScript mapping imposed by JSON.parse and JSON.stingify. They would both greatly benefit from an options object that allowed more control over the mapping (for example, enabling built-in internalization of large integer as BigInts or the features discussed in the present proposal).

I think introducing such an options object is the right place to start working on JSON processing extensions.

bikeshedding the mechanism for serialization

In #12 (comment), it's proposed that replacer functions would be provided a unique-per-invocation rawTag symbol which would be used like

function replacer(key, val, {rawTag}) {
  if ( typeof val !== "bigint" ) return val;
  return {[rawTag]: String(val)};
}

assert.strictEqual(JSON.stringify([1n], replacer), "[1]");

I like the design goals but the design itself is somewhat clunky to use. Can I propose instead providing a per-invocation function which would perform the marking? That is:

function replacer(key, val, {raw}) {
  if ( typeof val !== "bigint" ) return val;
  return raw(String(val));
}

assert.strictEqual(JSON.stringify([1n], replacer), "[1]");

I think that ends up being a lot nicer to use while accomplishing the same goals. It's also more in line with what I've seen in the ecosystem when the problem of marking particular values arises in other context - e.g. the tag function in this library.

Under the hood I'm fine if the implementation of raw is to return an object with a particular unique-per-invocation symbol-named property, though it's probably nicer to have it return some new opaque object (and have the JSON.stringify throw if it encounters such an opaque object produced from something other than the current invocation's raw).

Expose via new callback argument vs. new function

This proposal currently suggests invoking JSON.parse reviver functions with additional arguments, but @erights noted that he cannot remember any spec change that added arguments to either built-in function or a callback invoked by a built-in function. The specification itself has nothing to say on this issue; a note recommending that implementations do not add custom parameters to built-in functions is the closest I could find:

Implementations that add additional capabilities to the set of built-in functions are encouraged to do so by adding new functions rather than adding new parameters to existing functions.

If the committee is uncomfortable with providing new arguments to JSON.parse reviver functions, then we would need a shift to something like JSON.parseWithSource(source, reviver) or JSON.parseWithSource(source, options).

Stage 3 Specification Review: Waldemar Horwat

This is a placeholder task for the Stage 3 Specification Review feedback from @waldemarhorwat.

Requirements for stage 2

We noticed that this proposal doesn't contain any spec text and is proposed for stage two. Without spec text, this doesn't meet the criteria for stage two: https://tc39.es/process-document/

Expose position and/or input of source text?

String.prototype.replace passes position and input arguments to replacer functions and the return value from RegExp.prototype.exec has "index" and "input" properties; JSON.parse could behave similarly.

const input = '\n\t"use\\u0020strict"';
let spied;
const parsed = JSON.parse(input, (key, val, context) => (spied = context, val));
parsed === 'use strict';
// → true
spied.source === '"use\\u0020strict"';
// → true
spied.index === 2;
// → true
spied.input === input;
// → true

As noted by @rbuckton, this could be useful for error reporting.

How to reference JSON parse nodes

@waldemarhorwat notes that it is not correct to expect AssignmentExpression Parse Nodes from parsing "as a JSON text as specified in ECMA-404". Would it be correct to reference value Parse Nodes (cf. ECMA-404)?

ShallowestContainedJSONValue misrepresents negative numbers

In JSON, -1 is parsed as a single number token. But in ECMAScript, it is parsed as a UnaryExpression consisting of - and a NumericLiteral. ShallowestContainedJSONValue should therefore characterize numbers as |UnaryExpression| rather than |NumericLiteral|, so that InternalizeJSONProperty defining a "source" property using the source text matched by parseNode correctly sets the value to a string containing any negation prefix.

Naming of raw text placeholder

This proposal currently includes JSON.rawJSON and JSON.isRawJSON, but another possibility would be JSON.rawText and JSON.isRawText (although those would be somewhat less clear if extracted off the JSON namespace, e.g. const { isRawText } = JSON).

Assert failed for typedValNode is an ObjectLiteral Parse Node

Consider the example below:

reviver = function(p, v) {
  if (p == "a") {
    this.b = { get x() {return null}, set x(_){throw 666} }
  }
  return v;
}
JSON.parse('{"a":0,"b":1}', reviver);

According to the spec, when calling InternalizeJSONProperty({a: 0, b: { get x() {return null}, set x(_){throw 666} }}, 'b', reviver, 1)

// According step 5.
val = { get x() {return null}, set x(_){throw 666} }
// According step 6.c
// This will fail, NumberLiteral Parse Node is not ObjectLiteral Parse Node.
Assert(1 is ObjectLiteral Parse Node)

How to parse bigints when using Response.json() function?

In many cases, JSON parsing happens indirectly when an application calls Response.json() function after fetching JSON payload from a service using the Fetch API.

Would it be within the scope of this proposal to enhance Response.json() to add support for reviver option?
Or, is the recommendation to use Response.text() and then call JSON.parse directly with a reviver function?

Thank you for advancing this proposal! We at Esri are tracking this proposal as part of adding support for big integers to our ArcGIS Online Cloud platform. Not being able to serialize and de-serialize is a blocker at the moment.

Bug in `intToBigInt`?

const intToBigInt = (key, val, {source}) => typeof val === "number" && val % 1 === 0 ? BigInt(source) : val;

AFAICT, this code is going to fail for (e.g.) source === '123e2':

> BigInt('123e2')
SyntaxError: Cannot convert 123e2 to a BigInt

What about object/class as argument ?

What if instead of providing a string as 3rd argument it was an object ? This would allow future enhancement easy and could eventually provide some useful methods:

class JSONParseContext {
  readonly source: string;
  readonly keys: (string | number)[];
  readonly startPosition: number;
  readonly endPosition: number;
  // ... more properties allowed in the future
}

Supply an array of keys for understanding value context?

A reviver function sees values bottom-up, but the data structure hierarchy is already known and can be supplied to it, with or without the phantom leading empty string.

const input = '{ "foo": [{ "bar": "baz" }] }';
const expectedKeys = ['foo', 0, 'bar'];
let spiedKeys;
JSON.parse(input, (key, val, {keys}) => (spiedKeys = spiedKeys || keys, val));
expectedKeys.length === spiedKeys.length;
// → true
expectedKeys.every((key, i) => spiedKeys[i] === key);
// → true

This kind of context can be important for location-dependent reviving (e.g., convert a member like "timeCreated": "2020-02-06T02:24Z" to a Date instance when it appears in the representation of a REST resource, but not when it appears in the representation of a collection of freeform key–value tags.

tc39 / proposal-json-parse-with-source Goto Github PK

proposal-json-parse-with-source's Issues

Recommend Projects

Recommend Topics

Recommend Org