tc39 / proposal-json-parse-with-source Goto Github PK
View Code? Open in Web Editor NEWProposal for extending JSON.parse to expose input source text.
Home Page: https://tc39.github.io/proposal-json-parse-with-source
License: MIT License
Proposal for extending JSON.parse to expose input source text.
Home Page: https://tc39.github.io/proposal-json-parse-with-source
License: MIT License
Criteria taken from the TC39 process document minus those from previous stages:
- Complete spec text
https://github.com/tc39/proposal-json-parse-with-source/blob/master/spec.html
https://tc39.es/proposal-json-parse-with-source/
- Designated reviewers have signed off on the current spec text
- The ECMAScript editor has signed off on the current spec text
Let's have this example,
print(JSON.stringify(JSON.parse("[1, 2]", function (name, value, context) {
if (name === "0")
this[1] = [ "Hello" ];
return this[name];
})));
JSON.stringify
's result is [1,["Hello",["Hello"]]]
before this proposal because reviver modified array[1]
at runtime and we visit this newly inserted ["Hello"]
array.
But for this array, we do not have correct corresponding parse information since this is not coming from text. source information for this property is still 2
's information.
Thus,
i. Assert: typedValNode is an ArrayLiteral.
This will crash since this newly inserted array does not have right source information. Source information for this property says 2
(it is coming from the original source).
Has this proposal been implemented anywhere yet (e.g. via a polyfill)? Thanks!
Source text access is restricted to primitive values per #10 (comment) , so there is currently no information to provide for non-primitive values:
JSON.parse("42", (key, val, context) => context) // => {source: "42"}
JSON.parse("{}", (key, val, context) => context) // => ???
Some possibilities:
undefined
.source
property ({}
).source: undefined
.Forgive me if this is the wrong spot to put this.
I think JSON.rawJSON
is a really powerful API for performance-optimizing JSON serialization. But, because it is limited to only producing valid primitive JSON, it can't be used for "inline"-ing existing JSON.
I've got a couple use cases I want to use it for that requires feeding pre-serialized objects and arrays into the serialization of outer object trees. For example, in a typical REST API, you might retrieve 10 records from the database, and reply with one big JSON array of all of them. Each record might have a big JSON value on it, and if they are large, it performs poorly to de-serialize each record's JSON object to then just serialize it again to produce the REST API response holding all 10 records. Instead, it'd be great to leave the data as a string when fetching from the database, and then just insert it into the final JSON string produced by JSON.stringify
using JSON.rawJSON
to wrap each of these strings.
Without this capability, one has to resort to manually clobbering together JSON strings which is far less performant and correct than using the engine's built-in capabilities, or always deserializing just to serialize again. Userland implementations like json-stream-stringify
are far, far slower, and at least in my case, the JSON objects are really big, so deserializing and reserializing is a major performance issue.
I presume there is a justification for limiting what can be go through a .rawJSON
, but what is it? And, could there ever be a trusted mode, or some sort of escape hatch where for very performance sensitive use cases, any ole string could be sent along?
Also one other note: it seems that this low level API could really assist with performance optimization around avoiding re-serializing values you already have the source JSON string for, but as currently specified it can't because it does the safety check by parsing the string anyways. That seems correct but inefficient, again suggesting that it'd be great to have some sort of escape hatch for the brave. Notably, [[IsRawJSON]]
being an internal slot means that userland can't create their own raw JSON objects and pay the complexity / reliability price.
thread is slightly off-topic as real-world-scenario is parsing-json in c#/unity rather than javascript, but might be informative.
context:
RogueTech is community-mod for rpg-videogame BATTLETECH, and infamous for taking forever to startup+load - around 4 minutes on 2018 mac-mini.
after some instrumentation-patching, found out game+mod calls JSONSerializationUtility.RehydrateObjectFromDictionary()
930,000 times from program-startup to loading saved game (game has ~4000 json-files for various rpg-stats).
the 1-million calls in c# to parse json may have contributed to long load-time, but i have no proof. but just wanted to make aware of real-world application making high number of json-parsing calls and suffering [maybe unrelated] performance issues.
This proposal currently covers only the parsing side, but full round-tripping would also require serialization of e.g. BigInt values as unquoted digit sequences. The committee seemed tepid about including serialization in this proposal, but I still wanted to capture the concept even if it is rejected as expected.
apologies for revisiting #5 (add an options object to JSON.parse and JSON.stringify)
i think this proposal's primary motivation is parsing bigint? imo options-object like { bigintIfTooBig: true }
remains more ergonomic (and less error-prone):
// options-object
JSON.parse(
"[ 1, 12345678901234567890, 3, \"foo\" ]",
{
bigintIfTooBig: true,
bigintDigitLimit: 50 // security-check
}
);
// [ 1, 12345678901234567890n, 3, "foo" ]
// user-function
JSON.parse(
"[ 1, 12345678901234567890, 3, \"foo\" ]",
function (key, val, {source}) {
if (
typeof val === number
&& (val === Number.MAX_SAFE_INTEGER || val === Number.MIN_SAFE_INTEGER)
) {
// security-check
if (source.length > 50) {
throw new Error("bigint buffer-overflow");
}
return bigint(source);
}
return val;
}
);
// [ 1, 12345678901234567890n, 3, "foo" ]
roundtripping is more ergonomic as well:
let data;
data = [ 1, 12345678901234567890n, 3, "foo" ];
data = JSON.stringify(
data,
{
allowBigint: true
}
);
// "[ 1, 12345678901234567890, 3, \"foo\" ]"
data = JSON.parse(
data,
{
bigintIfTooBig: true,
bigintDigitLimit: 50
}
);
// [ 1, 12345678901234567890n, 3, "foo" ]
Hello!
Thank you for this proposal and your hard work!
While waiting for the proposal to be generally available, I've implemented a custom JSON parser that is future-compatible with the proposal. It doesn't look feasible to make it a proper polyfill, considering that JS implementation will probably be slower and we can't detect new features usage, but it's a ponyfill that can be used today where needed.
I'm not sure that I've implemented all the features correctly, it's not that easy to follow the standard, but all the serialization examples should work correctly. I will be very grateful if someone could review the implementation.
Thanks!
Criteria taken from the TC39 process document minus those from previous stages:
- Test262 acceptance tests have been written for mainline usage scenarios, and merged
TODO
- Two compatible implementations which pass the acceptance tests
TODO: Add #implementations
section to README
- Significant in-the-field experience with shipping implementations, such as that provided by two independent VMs
TODO: Add #implementations
section to README
Bug tickets to track:
- A pull request has been sent to https://github.com/tc39/ecma262 with the integrated spec text
TODO
- All ECMAScript editors have signed off on the pull request
TODO
say i'm working on an agile web-project with bigint where either:
the schema changes so frequently with each iteration that the schema-based
reviver function (key, val, src, keys) {...}
becomes tech-debt
or it has schema-less dictionaries with arbitrary key/val pairs.
in both cases i need an idiot-proof, schema-less JSON.parse solution that will preserve integer-precision by up-coercing integers to bigint as needed. would a cookbook solution look as follows?
require("http").createServer(async function (req, res) {
let result;
let reviver;
reviver = function (ignore, val, src) {
/*
* this [schema-less] reviver will preserve integer-precision
* by returning a bigint if precision-loss is detected
*
* reviver is not responsible for enforcing explicit, number/bigint schemas.
* that is left to user after JSON.parse has done its job.
*/
let bigint;
// ignore non-number case
if (typeof val !== "number") {
return val;
}
// secure against malicious, 1000-digit numbers
if (src.length > 1000) {
throw new Error("encountered number with >1000 digits");
}
// TODO - handle bigdecimal
if (src.indexOf(".") >= 0) {
...
}
try {
bigint = BigInt(src);
// ignore non-integer case
} catch (err) {
return val;
}
// integer precision-loss detected - return bigint
if (BigInt(val) !== bigint) {
return bigint;
}
// return val
return val;
};
result = await ... // read body from http-request-stream
// result = "{\
// \"dict\": {\
// \"bigdecimal\": 12345678901234567890.1234,\
// \"bigint\": 12345678901234567890,\
// \"float\": 1234.5678,\
// \"int\": 1234,\
// },\
// \"list\": [\
// 12345678901234567890.1234,\
// 12345678901234567890,\
// 1234.5678,\
// 1234\
// ]\
// }"
result = JSON.parse(result, reviver);
// result = {
// "dict": {
// "bigdecimal": ???,
// "bigint": 12345678901234567890n,
// "float": 1234.5678,
// "int": 1234,
// },
// "list": [
// ???,
// 12345678901234567890n,
// 1234.5678,
// 1234
// ]
// }
/*
* reviver is not responsible for enforcing explicit, number/bigint schemas.
* that is left to user after JSON.parse has done its job.
*/
result = ...
}).listen(8080);
This is a placeholder task for the Stage 3 Specification Review feedback from @michaelficarra.
toJSON
provides a capability available to any object for controlling its serialization, and a well-known Symbol would extend that capability to include raw output (e.g., JSON.stringify({toJSON(){ return {[Symbol.rawJSON]: longStringOfDigits}; }})
).
Alternatively, the mechanism for raw output could be limited to each individual invocation of JSON.stringify
or even to each individual invocation of a replacer function, either of which would have the potential benefit of coupling it to specifically-expressed author intent.
I want to revive a suggestion I made the last time I say a proposal to tweak ES JSON processing: tc39/proposal-well-formed-stringify#4 (comment)
There is nothing particularly sacred about the JSON/JavaScript mapping imposed by JSON.parse and JSON.stingify. They would both greatly benefit from an options object that allowed more control over the mapping (for example, enabling built-in internalization of large integer as BigInts or the features discussed in the present proposal).
I think introducing such an options object is the right place to start working on JSON processing extensions.
In #12 (comment), it's proposed that replacer functions would be provided a unique-per-invocation rawTag
symbol which would be used like
function replacer(key, val, {rawTag}) {
if ( typeof val !== "bigint" ) return val;
return {[rawTag]: String(val)};
}
assert.strictEqual(JSON.stringify([1n], replacer), "[1]");
I like the design goals but the design itself is somewhat clunky to use. Can I propose instead providing a per-invocation function which would perform the marking? That is:
function replacer(key, val, {raw}) {
if ( typeof val !== "bigint" ) return val;
return raw(String(val));
}
assert.strictEqual(JSON.stringify([1n], replacer), "[1]");
I think that ends up being a lot nicer to use while accomplishing the same goals. It's also more in line with what I've seen in the ecosystem when the problem of marking particular values arises in other context - e.g. the tag
function in this library.
Under the hood I'm fine if the implementation of raw
is to return an object with a particular unique-per-invocation symbol-named property, though it's probably nicer to have it return some new opaque object (and have the JSON.stringify
throw if it encounters such an opaque object produced from something other than the current invocation's raw
).
This proposal currently suggests invoking JSON.parse
reviver functions with additional arguments, but @erights noted that he cannot remember any spec change that added arguments to either built-in function or a callback invoked by a built-in function. The specification itself has nothing to say on this issue; a note recommending that implementations do not add custom parameters to built-in functions is the closest I could find:
Implementations that add additional capabilities to the set of built-in functions are encouraged to do so by adding new functions rather than adding new parameters to existing functions.
If the committee is uncomfortable with providing new arguments to JSON.parse
reviver functions, then we would need a shift to something like JSON.parseWithSource(source, reviver)
or JSON.parseWithSource(source, options)
.
This is a placeholder task for the Stage 3 Specification Review feedback from @waldemarhorwat.
We noticed that this proposal doesn't contain any spec text and is proposed for stage two. Without spec text, this doesn't meet the criteria for stage two: https://tc39.es/process-document/
String.prototype.replace
passes position and input arguments to replacer functions and the return value from RegExp.prototype.exec
has "index" and "input" properties; JSON.parse
could behave similarly.
const input = '\n\t"use\\u0020strict"';
let spied;
const parsed = JSON.parse(input, (key, val, context) => (spied = context, val));
parsed === 'use strict';
// → true
spied.source === '"use\\u0020strict"';
// → true
spied.index === 2;
// → true
spied.input === input;
// → true
As noted by @rbuckton, this could be useful for error reporting.
@waldemarhorwat notes that it is not correct to expect AssignmentExpression Parse Nodes from parsing "as a JSON text as specified in ECMA-404". Would it be correct to reference value Parse Nodes (cf. ECMA-404)?
In JSON, -1
is parsed as a single number token. But in ECMAScript, it is parsed as a UnaryExpression consisting of -
and a NumericLiteral. ShallowestContainedJSONValue should therefore characterize numbers as |UnaryExpression| rather than |NumericLiteral|, so that InternalizeJSONProperty defining a "source" property using the source text matched by parseNode correctly sets the value to a string containing any negation prefix.
This proposal currently includes JSON.rawJSON
and JSON.isRawJSON
, but another possibility would be JSON.rawText
and JSON.isRawText
(although those would be somewhat less clear if extracted off the JSON namespace, e.g. const { isRawText } = JSON
).
Consider the example below:
reviver = function(p, v) {
if (p == "a") {
this.b = { get x() {return null}, set x(_){throw 666} }
}
return v;
}
JSON.parse('{"a":0,"b":1}', reviver);
According to the spec, when calling InternalizeJSONProperty({a: 0, b: { get x() {return null}, set x(_){throw 666} }}, 'b', reviver, 1)
// According step 5.
val = { get x() {return null}, set x(_){throw 666} }
// According step 6.c
// This will fail, NumberLiteral Parse Node is not ObjectLiteral Parse Node.
Assert(1 is ObjectLiteral Parse Node)
In many cases, JSON parsing happens indirectly when an application calls Response.json()
function after fetching JSON payload from a service using the Fetch API.
Would it be within the scope of this proposal to enhance Response.json()
to add support for reviver
option?
Or, is the recommendation to use Response.text()
and then call JSON.parse
directly with a reviver function?
Thank you for advancing this proposal! We at Esri are tracking this proposal as part of adding support for big integers to our ArcGIS Online Cloud platform. Not being able to serialize and de-serialize is a blocker at the moment.
const intToBigInt = (key, val, {source}) => typeof val === "number" && val % 1 === 0 ? BigInt(source) : val;
AFAICT, this code is going to fail for (e.g.) source === '123e2'
:
> BigInt('123e2')
SyntaxError: Cannot convert 123e2 to a BigInt
What if instead of providing a string as 3rd argument it was an object ? This would allow future enhancement easy and could eventually provide some useful methods:
class JSONParseContext {
readonly source: string;
readonly keys: (string | number)[];
readonly startPosition: number;
readonly endPosition: number;
// ... more properties allowed in the future
}
A reviver function sees values bottom-up, but the data structure hierarchy is already known and can be supplied to it, with or without the phantom leading empty string.
const input = '{ "foo": [{ "bar": "baz" }] }';
const expectedKeys = ['foo', 0, 'bar'];
let spiedKeys;
JSON.parse(input, (key, val, {keys}) => (spiedKeys = spiedKeys || keys, val));
expectedKeys.length === spiedKeys.length;
// → true
expectedKeys.every((key, i) => spiedKeys[i] === key);
// → true
This kind of context can be important for location-dependent reviving (e.g., convert a member like "timeCreated": "2020-02-06T02:24Z"
to a Date instance when it appears in the representation of a REST resource, but not when it appears in the representation of a collection of freeform key–value tags.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.