Coder Social home page Coder Social logo

jsonlines's Introduction

jsonlines's People

Contributors

albertwiersch avatar albinkc avatar alexey-milovidov avatar astroorbis avatar atensoftware avatar danarth avatar ds-cbo avatar duckbrain avatar eliasdorneles avatar ghislainfourny avatar happy-san avatar ibobik avatar ifcologne avatar imba-tjd avatar is2ei avatar j-f1 avatar jexp avatar jlines avatar jrhizor avatar juarezr avatar keck avatar leoalho avatar nicoddemus avatar rspilker avatar sdepablos avatar shaver avatar simonfrey avatar slotix avatar sp4ce avatar wardi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

jsonlines's Issues

"missing piece"

The documentation at https://jsonlines.org/examples says:

The biggest missing piece is an import/export filter for popular spreadsheet programs so that non-programmers can use this format.

Since the same page mentions jq, it might be worthwhile pointing out that for flat arrays of the type shown at the top of the page, the following jq command produces valid CSV:

jq -r @csv

(Similarly for TSV.)

Of course jq can be used more generally, e.g. as illustrated at https://stackoverflow.com/questions/57242240/jq-object-cannot-be-csv-formatted-only-array

Allow comment lines

For large static data files it would be useful to have comment lines (e.g. start with #) to document the contents or trigger special processing.

mention the "jq" tool

Instead of the magic Python hack as an alias, the example page should probably recommend using the "identifiy transform" operator . offered by jq (http://stedolan.github.io/jq/). This will produce nicely formatted JSON (even with colours if stdout is a tty) from a jsonlines file:

$ cat data.jsonl | jq .

Or even:

$ jq . data.jsonl

Mattermost uses jsonl

FYI: Self-hosted Mattermost uses jsonl to bulk load data, for example in data migration from another instance or other platforms.

Adding this to the On the web page would be nice, especially since Mattermost is a widely-used tool- This could bring some more attention to JSON Lines.

Define standard way to convert JSONL to JSON

Perhaps it would also be useful to define an alternate form of jsonlines where the file would still be valid json. I'll refer to this alternate as jsonnewlines. If you wanted to treat a jsonnewlines file as a jsonlines file, you would simply ignore the first and last line of the file as well as the comma before each line separator.

For example, this jsonlines file:

{"name": "Gilbert", "wins": [["straight", "7♣"], ["one pair", "10♥"]]}
{"name": "Alexa", "wins": [["two pair", "4♠"], ["two pair", "9♠"]]}
{"name": "May", "wins": []}
{"name": "Deloise", "wins": [["three of a kind", "5♣"]]}

Would look like this as jsonnewlines:

{"winningHands":[
{"name": "Gilbert", "wins": [["straight", "7♣"], ["one pair", "10♥"]]},
{"name": "Alexa", "wins": [["two pair", "4♠"], ["two pair", "9♠"]]},
{"name": "May", "wins": []},
{"name": "Deloise", "wins": [["three of a kind", "5♣"]]}
]}

It seems like this would still let you get the benefits of jsonlines while also allowing for the possibility of treating the file as JSON in whole.

Line breaks

May I suggest that '\r' is also supported as a line break? I have been using JSON lines for a while now and Mac users do sometimes give me files where lines are terminated by '\r'. It is as natural for them as '\n' is natural for Linux programmers.

I do believe that JSON Lines is a very convenient format and hope to keep it that way for all users.

new suggestion to ignore commented lines

What do you think about commented lines for better readability of the file

[hash] Person records
{"label": "Person", "id":"001"}
{"label": "Person", "id":"002"}
[hash] Employee records
{"label": "Employee", "id":"001"}
{"label": "Employee", "id":"002"}

Examples are very hard to read

The examples have red on black. Very hard to read. Either just use HTML formatting or change the theme for the pictures. Please.

validator does not accept newline at end

According to Wikipedia, the Unix convention is that text "lines are sequences of zero or more non-newline characters plus a terminating newline character". "\n" is therefore not a line separator, but a line terminator. This convention is also enforced by some text editors, which add a newline at the end of the file if there isn't one.

Your description of JSON Lines is lenient in that respect: "The last character in the file may be a line separator, and it will be treated the same as if there was no line separator present."

However, your validator does not accept lines like this. If I enter e.g. "42", that's declared valid jsonl, but "42\n" gives "SyntaxError: Unexpected end of JSON input".

Node.js modules for handling JSON lines

We (i.e., @mattwagl and me) have written two Node.js modules, one for the server and one for the client, to stream and parse JSON lines.

Maybe they are of interest for you. If you want to, have a look at them and tell us what you think :-)

You find them on GitHub and on npm…

Thanks for the JSON lines website, and keep up the good work :-)

Standard MIME content-type

What do you think about adding new HTTP content-type for jsonlines data.
What about application/jsonl?

Why? (explain it on website)

JSON:

[
   "a",
   { "b": "c" },
   [ "d" ]
]

JSONL

   "a"
   { "b": "c" }
   [ "d" ]

what is difference, except additional [ and ] in JSON?

you stole my idea ;-)

hey all, thanks for putting this website up. i've been using this exact format a lot in the past and was thinking about putting up a simple page with a simple description of the format. today i found out you already did so. great!

Link to (new) Wikipedia 'JSON Streaming' article

I started writing an issue suggesting that you link to the Line Delimited JSON article on wikipedia, and perhaps help to clean it up a little.

The more I looked at it, however, the more I realized that it wasn't as good foundation.

So I ended up writing a new wikipedia article myself: JSON Streaming

I think it's much more informative and balanced (naturally). I'd be grateful if you'd review it and, if you're happy with the content, link to it from jsonlines.org. If you spot something that needs changing or adding, please go ahead and edit the article yourself. In fact doing that anyway, even in some small way, will help the article when the Wikipedians get around to reviewing it.

Thoughts on CSV replacement

I first encountered the idea when importing data into mongodb a long time ago. Since then, I haven't seen any mention of it until I found your site. I really like it that you have pushed it forward as a standard.

I want to focus on the CSV example. It makes a lot of sense.

It would be nice if this was a separate standard that borrowed from json lines. That way you could enforce some conventions.

Their could be a requirement that the first row somehow indicates that it is a header.
A header could be optional.
You could also enforce the requirement that all rows must be the same length.

What are your thoughts about this?

Clarify if leading white space is ignored

Well, title pretty much says it all. Currently, the page writes "trailing white space is ignored". Correct me if I am wrong, but implicitly, leading white space is ignored too, because it has no contents that can go into a parsed data structure (?). Perhaps this ought to be clarified.

SSL Error on Firefox

In Firefox 75, I get an SSL_ERROR_BAD_CERT_DOMAIN because The certificate is only valid for the following names: *.github.com, github.com. The certificate should be replaced with a proper one to prevent SSL errors.

Comment convention

I would like to see a convention that any line starting with a # be silently skipped. We us it internally to indicate a successful conversion of files.

# This is a comment about my records
{"Foo": 1 }
{"bar": "is life", URI: "data:;base64;jksjd" }
# This is the end of my file

Suggestion: Invalid lines are ignored, or more easily implemented, lines must start with trimmed '{' or '['

Suggestion: Invalid lines are ignored, or more easily implemented -> lines must start with trimmed '{' or '['

I love this format but the restriction that every line MUST be valid json would be better if invalid lines were just ignored.

Implementing an invalid-json-checker that actually checks the validity of each line has too great of a complexity for the specification, but adding a very simple rule. lines that don't start with '{' or '[' are ignored from processing. now that's cooking with gas.

If this breaks the strict "jsonlines standard" maybe call it jsonlines-loose.

---- begin of example jsonlines.jsonll ---- 
# I can now be a comment because the row will be skipped.
{ "msg": "I'll be processed" }
{ "msg": "I'll blow up by whatever parses the row as json is not closed ->"
    {"msg":"I can be a legit  row because string.trim() is simple in most languages and starts with a known good character"}
---- end of example jsonlines.jsonll ---- 

Wrong claims about NDJSON on "On the web" page

On the On the web page, the following entries are listed:

| NDJ is a similar format that also allows C++ style comments and blank lines
| ndjson is a similar format that also allows blank lines

Both of the above entries refer to the same URL.

The first entry has two issues:

  1. The formal name, i is NDJSON (uppercase), not NDJ.
  2. I see no evidence of "C++ style comments" being supported in the NDJSON spec

Newlines in data

How are new lines in the data itself handled? Are they illegal?

It would be good to mention this.

Nodejs JSONL validator

Hi, I have been working with several openAI fine-tune models that use jsonlines as their training data format and was frustrated that there was not an easy and quick tool in nodejs to validate JSON lines files, so I wrote one myself: validate-jsonl.

I could create a PR if you deem it worthy to be included on the "Examples" or "validator" pages.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.