Coder Social home page Coder Social logo

Why? (explain it on website) about jsonlines HOT 7 OPEN

wardi avatar wardi commented on August 24, 2024
Why? (explain it on website)

from jsonlines.

Comments (7)

wardi avatar wardi commented on August 24, 2024 2

@GabrielGorta thank you for asking, here are a few reasons:

  • we can add records to JSONL with file append operations (faster than parsing and re-writing)
  • we can jump to a specific record in JSONL by counting newlines instead of parsing JSON (much faster)
  • JSONL is better suited for logging/streaming data because of the above
  • we can use common line-oriented unix tools e.g. grep for finding complete matching records and producing another valid JSONL output

We could add some more examples like these to the site.

from jsonlines.

polarathene avatar polarathene commented on August 24, 2024 1
[
   { "a": "b" }
]

is fine

Not fine.

You have a single element in that array, as soon as you have another you'd need a trailing comma. JSON5 or JSONC allows for omitting any trailing comma I think, but still would need to be array wrapped.

asciinema uses JSONL for .cast recordings for example since it can just append each new frame to a file output.

from jsonlines.

GabenGar avatar GabenGar commented on August 24, 2024 1

I don't see difference when as "separator" is \n or ,?\n, just do content.split(',?\n) and you have exact same result, as if with content.split('\n') in case of JSONL

content.split() assumes the content value is fully parsed and loaded into memory. And every time the content of the underlying file changes, you'll have to to reparse and split it again. Since JSON requires parsing an entire document in order to assert its validity, it's at least an O(N) complexity uncacheable operation, which is not suitable for storing large collections.

from jsonlines.

gabrielgortabns avatar gabrielgortabns commented on August 24, 2024

Well, this explanation should be fine to mentoy on website.

Also JSON can be formated the way that on one line is only one value from array, so practically all the reasons you mentoyed would be possible with just JSON, no need of parsing...
ofcourse it's true only with the specific formating. E.g.

[
   { "a": "b" }
]

is fine, but in case of:

[
    {
        "a": "b"
    }
]

is problem. In JSONL, it's not, because it's always JSON per line, but this needs to be explained on website.

from jsonlines.

GabenGar avatar GabenGar commented on August 24, 2024

JSON is not a streamable format, formatting it for human consumption has nothing to do with it.

from jsonlines.

gabrielgortabns avatar gabrielgortabns commented on August 24, 2024

And what is problem with trailing comma? it can simply remove that, when selecting specific line (or JSON), and you still have valid JSON. E.g. in terminal, when I do grep, I just pipe it to func that removes ending trailing comma (if there is) from string. Before it even goes to parser. Removing last character from string is not costly at all.

I don't see difference when as "separator" is \n or ,?\n, just do content.split(',?\n) and you have exact same result, as if with content.split('\n') in case of JSONL

from jsonlines.

polarathene avatar polarathene commented on August 24, 2024

Yes but the reasons have been cited above already.

  • Append new record to file, there is no start + end array [ + ] notation, you don't have to delete the last line (]) and append a , to the previous if you want to maintain valid JSON, just append the record: '"num": 42' >> log.jsonl which is good for a stream that is collecting input until \n.
  • With your JSON input, as a record has more items it won't split each top-level element of the document into single lines. JSON is typically minified to a single line or pretty printed with multi-level indetation expanding nested items. You need to transform that too now to get equivalent of JSONL.
  • With streaming input if you don't have the \n delimiter, you need something else to read as a document delimiter. RFC 7464 does this with RS (relevant RFC section). You'd still need to insert that though.

You can transform JSON to JSONL, with the more practical approach being through something like jq to parse the JSON array and output JSONL](https://stackoverflow.com/questions/42178636/how-to-use-jq-to-output-jsonl-one-independent-json-object-per-line). That would be more reliable than your split approach.

The advantage with JSONL is as mentioned streaming, you can output data from one program as it's ready, while the other program ingests it as it arrives, without blocking on waiting for the full document to be constructed (if one even would be "completed", such as reading a file that's appended to frequently (eg: logs).

from jsonlines.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.