Coder Social home page Coder Social logo

[question] Converting array of objects from memory to streamed gzipped JSON, and then reading this content from the db and doing the opposite about stream-json HOT 4 CLOSED

Ncifra avatar Ncifra commented on June 13, 2024
[question] Converting array of objects from memory to streamed gzipped JSON, and then reading this content from the db and doing the opposite

from stream-json.

Comments (4)

Ncifra avatar Ncifra commented on June 13, 2024 1

Thanks, I will see into testing these options, but I think I have a blueprint on where to start with these, so for the moment and as a general helper for someone with a similar issue I think these will do.

from stream-json.

uhop avatar uhop commented on June 13, 2024

I cannot judge if it makes sense to use stream-json in your case. If you already have a JSON string in memory, you can try JSON.parse() to convert it to an object. That would be faster than stream-json.

As to piping strings or buffers to the parser, it is simple. A parser is just a writable stream: https://nodejs.org/api/stream.html — you can do it like that:

const {chain}  = require('stream-chain');
const {parser} = require('stream-json');
const {streamArray} = require('stream-json/streamers/StreamArray');

const fs = require('fs');

const pipeline = chain([
  parser(),
  streamArray()
]);

let objectCounter = 0;
pipeline.on('data', () => ++objectCounter);
pipeline.on('end', console.log(`Found ${objectCounter} objects.`));

// now you can feed strings or buffers as per docs
pipeline.write('[');
pipeline.write('1,');
pipeline.write('"one",');
pipeline.write('{"a": 1},');
pipeline.write('[1, 2, 3]');
pipeline.end(']');
// prints: Found 4 objects.

Obviously, this is a toy example. In the real program, you should avoid overflowing internal buffers, which is documented: https://nodejs.org/api/stream.html#writablewritechunk-encoding-callback

The return value is true if the internal buffer is less than the highWaterMark configured when the stream was created after admitting chunk. If false is returned, further attempts to write data to the stream should stop until the 'drain' event is emitted.

I hope it is enough to get you started.

from stream-json.

Ncifra avatar Ncifra commented on June 13, 2024

If you already have a JSON string in memory, you can try JSON.parse() to convert it to an object.

The problem is that when we do the first operation, so from object (more precisely array of objects) to string for gzip compression through Node utils, when the object is too large, we may have a crash of Node from JSON.stringify() due to the string length limit.

I was thinking of stringifying chunks, or parts of the array by some arbitrary rule, i.e. every x elements which we are sure are smaller than y size which doesn't break the JSON.stringify() call, and then gzip these and have a final gzip from the chunks, but I think I am getting too much ahead and it's theoretical.

The main problem is that putting everything as it is in JSON.stringify() is prone to crashes due to string length limits.

I hope it is enough to get you started.

Yes, thanks.

from stream-json.

uhop avatar uhop commented on June 13, 2024

To deal with stuff like that there are big guns: Disassembler to convert objects to a token stream, Stringer to convert a token stream to a JSON text stream, and jsonl/Stringer to convert a stream of objects to a JSONL text stream.

The latter is a fast conformant implementation of Stringer for a particular case of array/stream of objects to be serialized. The result can be fed back to Parser (or jsonl/Parser) and processed.

It sounds like you may want to either build a JSON file manually using the approach described in my example above but serializing individual objects of your array separately (as per your post) — this way you are responsible for the text you generate but it is the most flexible approach. Or you can use jsonl/Stringer.

A sketch (uses a superior brotli compression):

const {stringer: jsonlStringer} = require('stream-json/jsonl/Stringer');
const {chain} = require('stream-chain');

const fs = require('fs');
const zlib = require('zlib');

// setup a pipe
const pipeline = chain([
  jsonlStringer(),
  zlib.createBrotliCompress(),
  fs.createWriteStream('output.jsonl.br')
]);

// data
const array = [];
// later `array` is populated with objects

// serialization
for (const value of array) {
  pipeline.write(value);
}
pipeline.end();

No matter how you want to proceed please heed my warning about the overflowing internal buffers above. Obviously, it will change your serialization loop making it less trivial.

from stream-json.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.