Coder Social home page Coder Social logo

tc39 / proposal-json-parse-with-source Goto Github PK

View Code? Open in Web Editor NEW
185.0 31.0 8.0 96 KB

Proposal for extending JSON.parse to expose input source text.

Home Page: https://tc39.github.io/proposal-json-parse-with-source

License: MIT License

HTML 100.00%

proposal-json-parse-with-source's Introduction

JSON.parse source text access

A proposal for extending JSON.parse behavior to grant reviver functions access to the input source text and extending JSON.stringify behavior to support object placeholders for raw JSON text primitives.

2023 September slides

original 2018 September slides

Status

This proposal is at stage 3 of the TC39 Process.

Champions

  • Richard Gibson
  • Mathias Bynens

Motivation

Transformation between ECMAScript values and JSON text is lossy. This is most obvious in the case of deserializing numbers (e.g., "999999999999999999", "999999999999999999.0", and "1000000000000000000" all parse to 1000000000000000000), but also comes up when attempting to round-tripping non-primitive values such as Date objects (e.g., JSON.parse(JSON.stringify(new Date("2018-09-25T14:00:00Z"))) yields a string "2018-09-25T14:00:00.000Z").

Neither of these examples is hypothetical—serializing a BigInt as JSON is specified to throw an exception because there is no output that would round-trip through JSON.parse, and a similar concept has been raised regarding the Temporal proposal.

JSON.parse accepts a reviver function capable of processing inbound values, but it is invoked bottom-up and receives so little context (a key, an already-lossy value, and a receiver upon which key is an own property with value value) that it is practically useless. We intend to remedy that.

Proposed Solution

Update JSON.parse to provide reviver functions with more arguments, primarily conveying the source text from which a value was derived (inclusive of punctuation but exclusive of leading/trailing insignificant whitespace).

Serialization

Although originally not included in this proposal, support for non-lossy serialization with JSON.stringify (and thus also complete round-trippability) was requested and added (cf. #12), currently using special "raw JSON" frozen objects constructible with JSON.rawJSON but possibly subject to change (cf. #18 and #19).

Illustrative examples

const digitsToBigInt = (key, val, {source}) =>
  /^[0-9]+$/.test(source) ? BigInt(source) : val;

const bigIntToRawJSON = (key, val) =>
  typeof val === "bigint" ? JSON.rawJSON(String(val)) : val;

const tooBigForNumber = BigInt(Number.MAX_SAFE_INTEGER) + 2n;
JSON.parse(String(tooBigForNumber), digitsToBigInt) === tooBigForNumber;
// → true

const wayTooBig = BigInt("1" + "0".repeat(1000));
JSON.parse(String(wayTooBig), digitsToBigInt) === wayTooBig;
// → true

const embedded = JSON.stringify({ tooBigForNumber }, bigIntToRawJSON);
embedded === '{"tooBigForNumber":9007199254740993}';
// → true

Potential enhancements

Expose position and input information

String.prototype.replace passes position and input arguments to replacer functions and the return value from RegExp.prototype.exec has "index" and "input" properties; JSON.parse could behave similarly.

const input = '\n\t"use\\u0020strict"';
let spied;
const parsed = JSON.parse(input, (key, val, context) => (spied = context, val));
parsed === 'use strict';
// → true
spied.source === '"use\\u0020strict"';
// → true
spied.index === 2;
// → true
spied.input === input;
// → true

Supply an array of keys for understanding value context

A reviver function sees values bottom-up, but the data structure hierarchy is already known and can be supplied to it, with or without the phantom leading empty string.

const input = '{ "foo": [{ "bar": "baz" }] }';
const expectedKeys = ['foo', 0, 'bar'];
let spiedKeys;
JSON.parse(input, (key, val, {keys}) => (spiedKeys = spiedKeys || keys, val));
expectedKeys.length === spiedKeys.length;
// → true
expectedKeys.every((key, i) => spiedKeys[i] === key);
// → true

Discussion

Backwards Compatibility

Conforming ECMAScript implementations are not permitted to extend the grammar accepted by JSON.parse. This proposal does not make any such attempt, and adding new function parameters is among the safest changes that can be made to the language. All input to JSON.parse that is currently rejected will continue to be, all input that is currently accepted will continue to be, and the only changes to its output will be directly controlled by user code.

Modified values

Reviver functions are intended to modify or remove values in the output, but those changes should have no effect on the source-derived arguments passed to them. Because reviver functions are invoked bottom-up, this means that values may not correlate with source text. We consider this to be acceptable, but mostly moot (see the following point). Where not moot (such as when not-yet-visited array indexes or object entries are modified), source text is suppressed.

Non-primitive values

Per #10 (comment) , source text exposure is limited to primitive values.

proposal-json-parse-with-source's People

Contributors

gibson042 avatar ljharb avatar mathiasbynens avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

proposal-json-parse-with-source's Issues

Bug in `intToBigInt`?

const intToBigInt = (key, val, {source}) => typeof val === "number" && val % 1 === 0 ? BigInt(source) : val;

AFAICT, this code is going to fail for (e.g.) source === '123e2':

> BigInt('123e2')
SyntaxError: Cannot convert 123e2 to a BigInt

performance impact with 1-million json.parse calls

thread is slightly off-topic as real-world-scenario is parsing-json in c#/unity rather than javascript, but might be informative.

context:
RogueTech is community-mod for rpg-videogame BATTLETECH, and infamous for taking forever to startup+load - around 4 minutes on 2018 mac-mini.

after some instrumentation-patching, found out game+mod calls JSONSerializationUtility.RehydrateObjectFromDictionary() 930,000 times from program-startup to loading saved game (game has ~4000 json-files for various rpg-stats).

the 1-million calls in c# to parse json may have contributed to long load-time, but i have no proof. but just wanted to make aware of real-world application making high number of json-parsing calls and suffering [maybe unrelated] performance issues.

Should reviver functions still get a context argument for non-primitive values?

Source text access is restricted to primitive values per #10 (comment) , so there is currently no information to provide for non-primitive values:

JSON.parse("42", (key, val, context) => context) // => {source: "42"}
JSON.parse("{}", (key, val, context) => context) // => ???

Some possibilities:

  1. Do not provide the argument at all.
  2. Provide undefined.
  3. Provide an object with no source property ({}).
  4. Provide an object with source: undefined.

Reviver can modify holder object, like inserting new object / array, which breaks invariant of parsing information

Let's have this example,

print(JSON.stringify(JSON.parse("[1, 2]", function (name, value, context) {
    if (name === "0")
        this[1] = [ "Hello" ];
    return this[name];
})));

JSON.stringify's result is [1,["Hello",["Hello"]]] before this proposal because reviver modified array[1] at runtime and we visit this newly inserted ["Hello"] array.
But for this array, we do not have correct corresponding parse information since this is not coming from text. source information for this property is still 2's information.
Thus,

i. Assert: typedValNode is an ArrayLiteral.

This will crash since this newly inserted array does not have right source information. Source information for this property says 2 (it is coming from the original source).

Reusability of raw serialization mechanism

toJSON provides a capability available to any object for controlling its serialization, and a well-known Symbol would extend that capability to include raw output (e.g., JSON.stringify({toJSON(){ return {[Symbol.rawJSON]: longStringOfDigits}; }})).

Alternatively, the mechanism for raw output could be limited to each individual invocation of JSON.stringify or even to each individual invocation of a replacer function, either of which would have the potential benefit of coupling it to specifically-expressed author intent.

What about object/class as argument ?

What if instead of providing a string as 3rd argument it was an object ? This would allow future enhancement easy and could eventually provide some useful methods:

class JSONParseContext {
  readonly source: string;
  readonly keys: (string | number)[];
  readonly startPosition: number;
  readonly endPosition: number;
  // ... more properties allowed in the future
}

Extend this proposal to include serialization?

This proposal currently covers only the parsing side, but full round-tripping would also require serialization of e.g. BigInt values as unquoted digit sequences. The committee seemed tepid about including serialization in this proposal, but I still wanted to capture the concept even if it is rejected as expected.

A polyfill/ponyfill implementation

Hello!

Thank you for this proposal and your hard work!

While waiting for the proposal to be generally available, I've implemented a custom JSON parser that is future-compatible with the proposal. It doesn't look feasible to make it a proper polyfill, considering that JS implementation will probably be slower and we can't detect new features usage, but it's a ponyfill that can be used today where needed.

I'm not sure that I've implemented all the features correctly, it's not that easy to follow the standard, but all the serialization examples should work correctly. I will be very grateful if someone could review the implementation.

Thanks!

How to parse bigints when using Response.json() function?

In many cases, JSON parsing happens indirectly when an application calls Response.json() function after fetching JSON payload from a service using the Fetch API.

Would it be within the scope of this proposal to enhance Response.json() to add support for reviver option?
Or, is the recommendation to use Response.text() and then call JSON.parse directly with a reviver function?

Thank you for advancing this proposal! We at Esri are tracking this proposal as part of adding support for big integers to our ArcGIS Online Cloud platform. Not being able to serialize and de-serialize is a blocker at the moment.

add an options object to JSON.parse and JSON.stringify

I want to revive a suggestion I made the last time I say a proposal to tweak ES JSON processing: tc39/proposal-well-formed-stringify#4 (comment)

There is nothing particularly sacred about the JSON/JavaScript mapping imposed by JSON.parse and JSON.stingify. They would both greatly benefit from an options object that allowed more control over the mapping (for example, enabling built-in internalization of large integer as BigInts or the features discussed in the present proposal).

I think introducing such an options object is the right place to start working on JSON processing extensions.

Advance to stage 4

Criteria taken from the TC39 process document minus those from previous stages:

  • Test262 acceptance tests have been written for mainline usage scenarios, and merged

TODO

  • Two compatible implementations which pass the acceptance tests

TODO: Add #implementations section to README

  • Significant in-the-field experience with shipping implementations, such as that provided by two independent VMs

TODO: Add #implementations section to README

Bug tickets to track:

TODO

  • All ECMAScript editors have signed off on the pull request

TODO

Supply an array of keys for understanding value context?

A reviver function sees values bottom-up, but the data structure hierarchy is already known and can be supplied to it, with or without the phantom leading empty string.

const input = '{ "foo": [{ "bar": "baz" }] }';
const expectedKeys = ['foo', 0, 'bar'];
let spiedKeys;
JSON.parse(input, (key, val, {keys}) => (spiedKeys = spiedKeys || keys, val));
expectedKeys.length === spiedKeys.length;
// → true
expectedKeys.every((key, i) => spiedKeys[i] === key);
// → true

This kind of context can be important for location-dependent reviving (e.g., convert a member like "timeCreated": "2020-02-06T02:24Z" to a Date instance when it appears in the representation of a REST resource, but not when it appears in the representation of a collection of freeform key–value tags.

bikeshedding the mechanism for serialization

In #12 (comment), it's proposed that replacer functions would be provided a unique-per-invocation rawTag symbol which would be used like

function replacer(key, val, {rawTag}) {
  if ( typeof val !== "bigint" ) return val;
  return {[rawTag]: String(val)};
}

assert.strictEqual(JSON.stringify([1n], replacer), "[1]");

I like the design goals but the design itself is somewhat clunky to use. Can I propose instead providing a per-invocation function which would perform the marking? That is:

function replacer(key, val, {raw}) {
  if ( typeof val !== "bigint" ) return val;
  return raw(String(val));
}

assert.strictEqual(JSON.stringify([1n], replacer), "[1]");

I think that ends up being a lot nicer to use while accomplishing the same goals. It's also more in line with what I've seen in the ecosystem when the problem of marking particular values arises in other context - e.g. the tag function in this library.

Under the hood I'm fine if the implementation of raw is to return an object with a particular unique-per-invocation symbol-named property, though it's probably nicer to have it return some new opaque object (and have the JSON.stringify throw if it encounters such an opaque object produced from something other than the current invocation's raw).

cookbook scenario to JSON.parse bigint in schema-less data

say i'm working on an agile web-project with bigint where either:

  1. the schema changes so frequently with each iteration that the schema-based
    reviver function (key, val, src, keys) {...} becomes tech-debt

  2. or it has schema-less dictionaries with arbitrary key/val pairs.

in both cases i need an idiot-proof, schema-less JSON.parse solution that will preserve integer-precision by up-coercing integers to bigint as needed. would a cookbook solution look as follows?

require("http").createServer(async function (req, res) {
    let result;
    let reviver;

    reviver = function (ignore, val, src) {
    /*
     * this [schema-less] reviver will preserve integer-precision
     * by returning a bigint if precision-loss is detected
     *
     * reviver is not responsible for enforcing explicit, number/bigint schemas.
     * that is left to user after JSON.parse has done its job.
     */
        let bigint;
        // ignore non-number case
        if (typeof val !== "number") {
            return val;
        }
        // secure against malicious, 1000-digit numbers
        if (src.length > 1000) {
            throw new Error("encountered number with >1000 digits");
        }
        // TODO - handle bigdecimal
        if (src.indexOf(".") >= 0) {
            ...
        }
        try {
            bigint = BigInt(src);
        // ignore non-integer case
        } catch (err) {
            return val;
        }
        // integer precision-loss detected - return bigint
        if (BigInt(val) !== bigint) {
            return bigint;
        }
        // return val
        return val;
    };

    result = await ... // read body from http-request-stream
    // result = "{\
    //     \"dict\": {\
    //         \"bigdecimal\": 12345678901234567890.1234,\
    //         \"bigint\": 12345678901234567890,\
    //         \"float\": 1234.5678,\
    //         \"int\": 1234,\
    //     },\
    //     \"list\": [\
    //         12345678901234567890.1234,\
    //         12345678901234567890,\
    //         1234.5678,\
    //         1234\
    //     ]\
    // }"

    result = JSON.parse(result, reviver);
    // result = {
    //     "dict": {
    //         "bigdecimal": ???,
    //         "bigint": 12345678901234567890n,
    //         "float": 1234.5678,
    //         "int": 1234,
    //     },
    //     "list": [
    //         ???,
    //         12345678901234567890n,
    //         1234.5678,
    //         1234
    //     ]
    // }

    /*
     * reviver is not responsible for enforcing explicit, number/bigint schemas.
     * that is left to user after JSON.parse has done its job.
     */
    result = ...
}).listen(8080);

Assert failed for typedValNode is an ObjectLiteral Parse Node

Consider the example below:

reviver = function(p, v) {
  if (p == "a") {
    this.b = { get x() {return null}, set x(_){throw 666} }
  }
  return v;
}
JSON.parse('{"a":0,"b":1}', reviver);

According to the spec, when calling InternalizeJSONProperty({a: 0, b: { get x() {return null}, set x(_){throw 666} }}, 'b', reviver, 1)

// According step 5.
val = { get x() {return null}, set x(_){throw 666} }
// According step 6.c
// This will fail, NumberLiteral Parse Node is not ObjectLiteral Parse Node.
Assert(1 is ObjectLiteral Parse Node) 

Expose via new callback argument vs. new function

This proposal currently suggests invoking JSON.parse reviver functions with additional arguments, but @erights noted that he cannot remember any spec change that added arguments to either built-in function or a callback invoked by a built-in function. The specification itself has nothing to say on this issue; a note recommending that implementations do not add custom parameters to built-in functions is the closest I could find:

Implementations that add additional capabilities to the set of built-in functions are encouraged to do so by adding new functions rather than adding new parameters to existing functions.

If the committee is uncomfortable with providing new arguments to JSON.parse reviver functions, then we would need a shift to something like JSON.parseWithSource(source, reviver) or JSON.parseWithSource(source, options).

Naming of raw text placeholder

This proposal currently includes JSON.rawJSON and JSON.isRawJSON, but another possibility would be JSON.rawText and JSON.isRawText (although those would be somewhat less clear if extracted off the JSON namespace, e.g. const { isRawText } = JSON).

Expose position and/or input of source text?

String.prototype.replace passes position and input arguments to replacer functions and the return value from RegExp.prototype.exec has "index" and "input" properties; JSON.parse could behave similarly.

const input = '\n\t"use\\u0020strict"';
let spied;
const parsed = JSON.parse(input, (key, val, context) => (spied = context, val));
parsed === 'use strict';
// → true
spied.source === '"use\\u0020strict"';
// → true
spied.index === 2;
// → true
spied.input === input;
// → true

As noted by @rbuckton, this could be useful for error reporting.

bigint ergonomics of user-function vs options-object

apologies for revisiting #5 (add an options object to JSON.parse and JSON.stringify)

i think this proposal's primary motivation is parsing bigint? imo options-object like { bigintIfTooBig: true } remains more ergonomic (and less error-prone):

// options-object
JSON.parse(
    "[ 1, 12345678901234567890, 3, \"foo\" ]",
    {
        bigintIfTooBig: true,
        bigintDigitLimit: 50 // security-check
    }
);
// [ 1, 12345678901234567890n, 3, "foo" ]

// user-function
JSON.parse(
    "[ 1, 12345678901234567890, 3, \"foo\" ]",
    function (key, val, {source}) {
        if (
            typeof val === number
            && (val === Number.MAX_SAFE_INTEGER || val === Number.MIN_SAFE_INTEGER)
        ) {
            // security-check
            if (source.length > 50) {
                throw new Error("bigint buffer-overflow");
            }
            return bigint(source);
        }
        return val;
    }
);
// [ 1, 12345678901234567890n, 3, "foo" ]

roundtripping is more ergonomic as well:

let data;
data = [ 1, 12345678901234567890n, 3, "foo" ];
data = JSON.stringify(
    data,
    {
        allowBigint: true
    }
);
// "[ 1, 12345678901234567890, 3, \"foo\" ]"
data = JSON.parse(
    data,
    {
        bigintIfTooBig: true,
        bigintDigitLimit: 50
    }
);
// [ 1, 12345678901234567890n, 3, "foo" ]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.