jsonlines
Documentation for the JSON Lines text file format
Visit https://jsonlines.org
Documentation for the JSON Lines text file format
Home Page: http://jsonlines.org
Documentation for the JSON Lines text file format
Visit https://jsonlines.org
To add to the page on the web -- https://github.com/louischatriot/nedb. Datastore relying on json lines to store records
The documentation at https://jsonlines.org/examples says:
The biggest missing piece is an import/export filter for popular spreadsheet programs so that non-programmers can use this format.
Since the same page mentions jq, it might be worthwhile pointing out that for flat arrays of the type shown at the top of the page, the following jq command produces valid CSV:
jq -r @csv
(Similarly for TSV.)
Of course jq can be used more generally, e.g. as illustrated at https://stackoverflow.com/questions/57242240/jq-object-cannot-be-csv-formatted-only-array
For large static data files it would be useful to have comment lines (e.g. start with #) to document the contents or trigger special processing.
editing html is tiresome
Visidata is a TUI app for visualizing and processing a lot of data formats, among which JSONLines.
Instead of the magic Python hack as an alias, the example page should probably recommend using the "identifiy transform" operator .
offered by jq
(http://stedolan.github.io/jq/). This will produce nicely formatted JSON (even with colours if stdout is a tty) from a jsonlines file:
$ cat data.jsonl | jq .
Or even:
$ jq . data.jsonl
Hi -- regarding https://jsonlines.org/on_the_web -- Miller (https://github.com/johnkerl/miller) supports JSON Lines: https://miller.readthedocs.io/en/latest/file-formats/#json-lines
FYI: Self-hosted Mattermost uses jsonl to bulk load data, for example in data migration from another instance or other platforms.
Adding this to the On the web page would be nice, especially since Mattermost is a widely-used tool- This could bring some more attention to JSON Lines.
Perhaps it would also be useful to define an alternate form of jsonlines where the file would still be valid json. I'll refer to this alternate as jsonnewlines. If you wanted to treat a jsonnewlines file as a jsonlines file, you would simply ignore the first and last line of the file as well as the comma before each line separator.
For example, this jsonlines file:
{"name": "Gilbert", "wins": [["straight", "7♣"], ["one pair", "10♥"]]}
{"name": "Alexa", "wins": [["two pair", "4♠"], ["two pair", "9♠"]]}
{"name": "May", "wins": []}
{"name": "Deloise", "wins": [["three of a kind", "5♣"]]}
Would look like this as jsonnewlines:
{"winningHands":[
{"name": "Gilbert", "wins": [["straight", "7♣"], ["one pair", "10♥"]]},
{"name": "Alexa", "wins": [["two pair", "4♠"], ["two pair", "9♠"]]},
{"name": "May", "wins": []},
{"name": "Deloise", "wins": [["three of a kind", "5♣"]]}
]}
It seems like this would still let you get the benefits of jsonlines while also allowing for the possibility of treating the file as JSON in whole.
May I suggest that '\r' is also supported as a line break? I have been using JSON lines for a while now and Mac users do sometimes give me files where lines are terminated by '\r'. It is as natural for them as '\n' is natural for Linux programmers.
I do believe that JSON Lines is a very convenient format and hope to keep it that way for all users.
I came accross this RFC: https://tools.ietf.org/html/rfc7464
It suggests using application/json-seq
as the mimetype for json objects delimited by new lines. It is also cited on wikipedia . For completeness sake, I also created a stackoverflow question/answer here .
If this is the correct mime type for jsonl
, then I suggest adding it to jsonlines.org since it was the first hit on google, when researching this.
The website would benefit to have some sort of continuous integration that would allow to publish it directly after content is push to GitHub.
Hi! On http://jsonlines.org/on_the_web/ it says "dat uses JSON Lines (newline-delimited JSON) in its streaming APIs". I think this might not be true any more, because dat is now focused on files in general rather than on any one data format.
What do you think about commented lines for better readability of the file
[hash] Person records
{"label": "Person", "id":"001"}
{"label": "Person", "id":"002"}
[hash] Employee records
{"label": "Employee", "id":"001"}
{"label": "Employee", "id":"002"}
https://mixpanel.com/help/reference/exporting-raw-data#api-details
Add to "On the web"!
hey I noticed http://ndjson.org/ and http://jsonlines.org/ are very similar, I was just wondering if maybe they could link to each other to reduce confusion? I like both names personally and use them interchangeably
cc @chrisdew
The examples have red on black. Very hard to read. Either just use HTML formatting or change the theme for the pictures. Please.
According to Wikipedia, the Unix convention is that text "lines are sequences of zero or more non-newline characters plus a terminating newline character". "\n" is therefore not a line separator, but a line terminator. This convention is also enforced by some text editors, which add a newline at the end of the file if there isn't one.
Your description of JSON Lines is lenient in that respect: "The last character in the file may be a line separator, and it will be treated the same as if there was no line separator present."
However, your validator does not accept lines like this. If I enter e.g. "42", that's declared valid jsonl, but "42\n
" gives "SyntaxError: Unexpected end of JSON input".
We (i.e., @mattwagl and me) have written two Node.js modules, one for the server and one for the client, to stream and parse JSON lines.
Maybe they are of interest for you. If you want to, have a look at them and tell us what you think :-)
You find them on GitHub and on npm…
Thanks for the JSON lines website, and keep up the good work :-)
What do you think about adding new HTTP content-type for jsonlines data.
What about application/jsonl
?
Link in https://github.com/wardi/jsonlines/blob/gh-pages/on_the_web/index.html is broken...
The site hosting http://trephine.org/t/index.php?title=Newline_delimited_JSON seems to have lapsed, currently it is serving a placeholder.
The last archive copy of it is at http://web.archive.org/web/20141009224232/http://trephine.org/t/index.php?title=Newline_delimited_JSON
Thanks for the useful jsonl tips,
Matthew
Since newline is escaped as \n in json strings how do you escape it in json lines format where the newline character (\n or \r\n) is the delimiter between objects?
A validator to validate .jsonl files or the content just like jsonlint
JSON:
[
"a",
{ "b": "c" },
[ "d" ]
]
JSONL
"a"
{ "b": "c" }
[ "d" ]
what is difference, except additional [
and ]
in JSON?
According to https://downforeveryoneorjustme.com/jsonlines.org or https://isitdownorjust.me/jsonlines-org/ it is 😭
hey all, thanks for putting this website up. i've been using this exact format a lot in the past and was thinking about putting up a simple page with a simple description of the format. today i found out you already did so. great!
I suggest to add Shopify's GraphQL Bulk API as example on the web that uses JSONL:
https://shopify.dev/api/usage/bulk-operations/queries
Sometimes the most brilliant ideas are the simplest. As someone who works with huge data files on a daily basis, I appreciate you creating and popularizing JSONL!
I started writing an issue suggesting that you link to the Line Delimited JSON article on wikipedia, and perhaps help to clean it up a little.
The more I looked at it, however, the more I realized that it wasn't as good foundation.
So I ended up writing a new wikipedia article myself: JSON Streaming
I think it's much more informative and balanced (naturally). I'd be grateful if you'd review it and, if you're happy with the content, link to it from jsonlines.org. If you spot something that needs changing or adding, please go ahead and edit the article yourself. In fact doing that anyway, even in some small way, will help the article when the Wikipedians get around to reviewing it.
I first encountered the idea when importing data into mongodb a long time ago. Since then, I haven't seen any mention of it until I found your site. I really like it that you have pushed it forward as a standard.
I want to focus on the CSV example. It makes a lot of sense.
It would be nice if this was a separate standard that borrowed from json lines. That way you could enforce some conventions.
Their could be a requirement that the first row somehow indicates that it is a header.
A header could be optional.
You could also enforce the requirement that all rows must be the same length.
What are your thoughts about this?
ClickHouse DBMS has support for jsonlines format under the name "JSONEachRow".
https://clickhouse.yandex/reference_en.html#JSONEachRow
Well, title pretty much says it all. Currently, the page writes "trailing white space is ignored". Correct me if I am wrong, but implicitly, leading white space is ignored too, because it has no contents that can go into a parsed data structure (?). Perhaps this ought to be clarified.
In Firefox 75, I get an SSL_ERROR_BAD_CERT_DOMAIN
because The certificate is only valid for the following names: *.github.com, github.com
. The certificate should be replaced with a proper one to prevent SSL errors.
For page:
https://jsonlines.org/on_the_web/
Please add that CSS HTML Validator for Windows v22.0211+ ( https://www.htmlvalidator.com/ ) now supports JSON Lines syntax checking.
Hi! I wanted to let you know that I am developing a jsonlines reader for the Julia language. You can find the latest docs here. I am happy for all suggestions and ideas. I hope it's ok to create an issue for this. ;)
The popular mongodb uses jsonlines to import/export (potentially large) data: http://docs.mongodb.org/manual/reference/program/mongoexport/#cmdoption--query
Apache Spark (http://spark.apache.org) uses JSONL for reading and writing JSON data with DataFrameReader.json(String path)
and DataFrameWriter.json(String path)
.
Logstash supports jsonlines via the json_lines codec.
What is the appropriate mime-type for jsonlines? Is it application/json
, too, or should I use something different?
Scrapy is a long-time JSON lines user; this is a default export format since forever. See also: https://twitter.com/pablohoffman/status/547385799093022723 :)
As a data point, Scrapy uses .jl extenson for JSON lines.
Couchbase Admin Console (and in a few months, its web application, Capella) can import and export JSON line files. FYI.
I would like to see a convention that any line starting with a #
be silently skipped. We us it internally to indicate a successful conversion of files.
# This is a comment about my records
{"Foo": 1 }
{"bar": "is life", URI: "data:;base64;jksjd" }
# This is the end of my file
Suggestion: Invalid lines are ignored, or more easily implemented -> lines must start with trimmed '{' or '['
I love this format but the restriction that every line MUST be valid json would be better if invalid lines were just ignored.
Implementing an invalid-json-checker that actually checks the validity of each line has too great of a complexity for the specification, but adding a very simple rule. lines that don't start with '{' or '[' are ignored from processing. now that's cooking with gas.
If this breaks the strict "jsonlines standard" maybe call it jsonlines-loose.
---- begin of example jsonlines.jsonll ----
# I can now be a comment because the row will be skipped.
{ "msg": "I'll be processed" }
{ "msg": "I'll blow up by whatever parses the row as json is not closed ->"
{"msg":"I can be a legit row because string.trim() is simple in most languages and starts with a known good character"}
---- end of example jsonlines.jsonll ----
On the On the web page, the following entries are listed:
| NDJ is a similar format that also allows C++ style comments and blank lines
| ndjson is a similar format that also allows blank lines
Both of the above entries refer to the same URL.
The first entry has two issues:
For someone arriving newly to the json-object-per-line idea, it would be useful to have a list of the variant forms (distinguishing features, website, tools).
I stumbled across https://github.com/fictorial/json-line-protocol which is CRLF-based.
Hope this is useful,
m
And if we're not settled on the mime-type yet, application/x-jsons
.
How are new lines in the data itself handled? Are they illegal?
It would be good to mention this.
Hi, I have been working with several openAI fine-tune models that use jsonlines as their training data format and was frustrated that there was not an easy and quick tool in nodejs to validate JSON lines files, so I wrote one myself: validate-jsonl.
I could create a PR if you deem it worthy to be included on the "Examples" or "validator" pages.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.