Coder Social home page Coder Social logo

multiple json objects in input? about gron HOT 9 OPEN

tomnomnom avatar tomnomnom commented on June 2, 2024 1
multiple json objects in input?

from gron.

Comments (9)

 avatar commented on June 2, 2024 4

jq -c leaves one object per line. So IMO

  • it would be ok if gron could work with json-line/ndjson out of the box and without sacrisfying speed, and
  • only switches to the expensive "multiple lines per json object and multiple objects" parsing if a command line is given

from gron.

tomnomnom avatar tomnomnom commented on June 2, 2024 3

@jan-schulz-k24 @filippog firstly: thank you for your patience!

I've added basic support for multi-object input in b9faf39

At the moment it only supports one object per line; I'm not 100% convinced this is the best solution but it's certainly the easiest to implement.

You can use the feature with the -s/--stream flag:

tom@work:~▶ cat stream.json 
{"one": 1, "two": 2, "three": [1, 2, 3]}
{"one": 1, "two": 2, "three": [1, 2, 3]}
tom@work:~▶ gron --stream stream.json 
json = [];
json[0] = {};
json[0].one = 1;
json[0].three = [];
json[0].three[0] = 1;
json[0].three[1] = 2;
json[0].three[2] = 3;
json[0].two = 2;
json[1] = {};
json[1].one = 1;
json[1].three = [];
json[1].three[0] = 1;
json[1].three[1] = 2;
json[1].three[2] = 3;
json[1].two = 2;

Internally it reads the input line by line, so it will start to provide output as soon as a line is available to read. So in the below example the output appears in three chunks with two second intervals between them.

tom@work:~▶ cat delay.sh 
#!/bin/bash
echo '{"one": 1, "two": 2}'
sleep 2
echo '{"three": 3, "four": 4}'
sleep 2
echo '{"five": 5, "six": 6}'
tom@work:~▶ ./delay.sh | gron -s
json = [];
json[0] = {};
json[0].one = 1;
json[0].two = 2;
json[1] = {};
json[1].four = 4;
json[1].three = 3;
json[2] = {};
json[2].five = 5;
json[2].six = 6;

This should make it possible to work with steaming HTTP APIs - most of which seem to provide one object per line.

I haven't tagged a release yet, and I'm going to leave this issue open for a while longer because I'd like to think more about supporting objects that span many lines.

Let me know your thoughts / if you have any problems.

Thanks again for your patience at this particularly busy time in my life! 😆

from gron.

filippog avatar filippog commented on June 2, 2024 2

Thanks @tomnomnom for working on this! I did a quick test with the dataset I have and works great with --stream !

from gron.

rjp avatar rjp commented on June 2, 2024 1

I've got JSON output with multiple objects but they're not one-per-line - gron currently only handles the first of these objects. I do have a hacky/sketchy patch for stream mode which handles this case but obviously don't want to step on any toes if there's another solution in the works?

from gron.

noahp avatar noahp commented on June 2, 2024 1

@rjp I also ran into this issue (specifically with the GitHub cli tool's --paginate option), and I ended up with this hacky sed one-liner to work around it:

gh api --paginate /repos/{owner}/{repo}/environments | sed -E 's|\}\{|\}\n\{|g' | gron --stream 

from gron.

tomnomnom avatar tomnomnom commented on June 2, 2024

Hi @filippog! Thank you! :)

It certainly can be supported, and I think it would be a good idea.

Do you think it would be OK to enable the feature with a command line option rather than trying to auto-detect multiple objects in the input?

from gron.

filippog avatar filippog commented on June 2, 2024

If autodetection is expensive and/or unreliable to convert to/from gron then yeah a command line option would do. Otherwise I was expecting gron to just work when fed multiple objects, case in point for me is reading from an access log where every entry is a json object, separated by \n

from gron.

 avatar commented on June 2, 2024

I've currently the same problem, in this case using it with jq:

Minimal example:

λ cat ~/gron_tmp                              
{ 
"data" : [
  {"a": "1"}, 
  {"a": "2"}
  ]
}

λ cat ~/gron_tmp |jq '.data[]'      
{
  "a": "1"
}
{
  "a": "2"
}

λ cat ~/gron_tmp |jq '.data[]' |gron
json = {};
json.a = "1";

I've also seen logfiles consisting of one json object per line (ndjson), but that expects minified json.

-> my preference would be if an object ends and a new one starts starts straight after (only whitespace between }...{), just take that object as well. I wouldn't even mind if there wouldn't be any difference between the lines:

λ cat ~/gron_tmp |jq '.data[]' |gron
json = {};
json.a = "1";
json = {};
json.a = "2";

Although a commandline switch to implicitly treat the input as a list would be nice:

λ cat ~/gron_tmp |jq '.data[]' |gron --assume-list
json[0] = {};
json[0].a = "1";
json[1] = {};
json[1].a = "2";

from gron.

tomnomnom avatar tomnomnom commented on June 2, 2024

Hey, sorry this hasn't had the attention it needs... Kids keep you pretty busy!

I've been giving some thought about the approach needed for this, and there's probably only two sane options:

  1. Require that the input be one JSON blob per line so it's easy to split on \n
  2. Do a pre-parse step to detect multiple JSON objects in the input

Option 1 is by far the easiest to implement, but it doesn't work for @jan-schulz-k24's example where each JSON blob spans multiple lines.

Option 2 is far more permissive, but it requires a rune-by-rune inspection of the input text (you can't just, say, regex for }[^,]*{ because that sequence could appear in a string value)

The problem with option 2 is that it's pretty expensive to do, especially when the input is very large. This is made slightly better by only enabling multi-object input when a command line flag is specified.

On balance I think that gron working in more situations is more important than performance, so option 2 is probably best.

from gron.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.