Coder Social home page Coder Social logo

Comments (12)

fangq avatar fangq commented on May 16, 2024 1

No, it is still not clear. You've already given yourself counter examples. Unless the N->1 mapping illustrated in your counter examples (again, many other forms exist) can be resolved, I don't see there is a clear round-trip invariant conversion between MATLAB data and JSON.

The argument that an automated encoder "cannot be produced" those counter examples is not strong enough. You can't predict/specify what an encoder should output. For jsonlab, any valid JSON should be accepted. But for the above reasons, it can not guarantee a level-1 round-trip convergence (i.e. arbitrary input data -> json->reproduced data, or json-> matlab data->reproduced json).

On the other hand, if you read my comments in this closed tracker (#1 (comment)),
one of the current design goals of jsonlab is to satisfy level-2 or higher round-trip convergence. In other words, if you iterate the input with loadjson/savejson multiple times, I want the output to be reproduced. Even this is difficult, not mention an arbitrary user input (either data or json string)

from jsonlab.

fangq avatar fangq commented on May 16, 2024 1

You can't predict/specify what an encoder should output.
Pardon?

What I meant was, as a general purpose JSON parser (at least that's what loadjson is aimed for), I can not specify the behavior and format for user's JSON encoders - they can use any encoders!

In fact, reading my past emails from the users and posts on the mailing list, many of them (>30%) produced their JSON data using an encoder other than savejson() - some were web-streamed data, some were dumped from dedicated software, and some were hand-coded. From loadjson's perspective, the only restriction I can ask is the input must be a valid JSON string. Other than that, I have no control to the anticipated depth of the brackets, use of white spaces, and user's preference of data format.

For jsonlab, any valid JSON should be accepted.
Maybe that's the point where we disagree

I agree. But jsonlab is open-source and both savejson and loadjson have straightforward structures. If you want to contribute and add the round-trip conservation for matlab-originated data, it is absolutely welcome! I just don't know how to do it (i.e. implementing it in a consistent and efficient way).

from jsonlab.

neogeogre avatar neogeogre commented on May 16, 2024 1

Because of this problem :
b = {1, 2, 3}

savejson('', b)
ans =
[
    1,
    2,
    3
]

c = [1, 2, 3]

 savejson('', c)
ans =
[1,2,3]

So you should be able to know if it's a cell if you have the line break in the json.
Another bug :

d = {1, 2, [1 2 3]}


savejson('', d)
ans =
[
    1,
    2,
    [1,2,3]
]
loadjson(ans)
Error using reshape
Size arguments must be real integers.

Error in loadjson>parse_array (line 194)
                    object=reshape(obj,dim2,numel(obj)/dim2)';

Error in loadjson (line 104)
            data{jsoncount} = parse_array(opt);

from jsonlab.

fangq avatar fangq commented on May 16, 2024

@sheljohn, to make the data round-trip invariant, you need to disable the "FastArrayParser" by setting it to 0

res=loadjson(savejson('',foo),'FastArrayParser',0)

to save the data with single

otherwise

res=loadjson(savejson('',foo))

gives you a row vector, and

res=loadjson(savejson('',foo,'SingletArray',1))

gives you a column vector.

a JSON array is defined as a 1D object list. it does not have built-in definitions of columns and rows for high-dimensional arrays. So, when exporting from a matlab cell array/matrix, I have to use [] to signify a row in a high-dimensional array. A a result, in jsonlab, [ [1], [2], [3] ] denotes a 3x1 vector, and [1, 2, 3] denotes a 1x3 row vector.

The difficulty you noticed is a result of ambiguities when mapping between cells/arrays to a JSON object. JSON's array is rather a cell. But when it has a regular numerical structure, I want MATLAB to use cell2mat to collapse it into an array. If one does not like this automatic conversion, one can use SimplifyCell and FastArrayParser to prevent it.

I don't see there is a clear fix of this, again because of the ambiguity of the mapping. So I am closing this tracker, but feel free to reopen if you want to suggest a fix.

from jsonlab.

sheljohn avatar sheljohn commented on May 16, 2024

@fangq
JSON does not have built-in definitions for containers shape, that is true, but not the problem here.
There is no "ambiguity" whatsoever when it comes to distinguishing between cells and vectors:

  • [ [1], [2], [3] ] maps to the (shape-less) cell {1,2,3}
  • [1, 2, 3] maps to the vector [1,2,3]
  • [ [1], [[2]], [3,4] ] maps to the cell {1,{2},[3,4]}

If you are not convinced, then you should realise that in Matlab {1,2,3} is equivalent to {[1],[2],[3]}, and [ [1], [2], [3] ] is equivalent to [1,2,3]; so arrays of arrays are always arrays, and cells of scalars are cells of arrays, and hence there is a way to distinguish unambiguously between cells and arrays in JSON (note that a cell of struct can also be distinguished from a struct-array in a similar manner).

The only "ambiguity" (which cannot be produced by an automated writer), is that in theory [ [1], [2], 3 ], [ [1], 2, 3 ], [ 1, [2], 3 ] etc. all map to the same cell {1,2,3}, even though I would argue that they should be treated as errors, and as I said there is no way to generate them with a correct writer.

from jsonlab.

fangq avatar fangq commented on May 16, 2024

From your proposed mapping, I think you have a preference using cell as containers. In that case, just set both "FastArrayParser" and "SimplifyCell" to 0. Everything becomes cell elements. But in my opinion, cell is not the most efficient data form for many applications, especially when processing such data inside matlab. On the other hand, array is the most natural and convenient data form in matlab; converting cells into arrays, whenever possible, is a natural choice.

Out of my curiosity, what do you propose to map [ [1,2], [3,4], [5,6] ]? also, I don't fully understand your statement

"arrays of arrays are always arrays, and therefore can be used in JSON to represent cells without ambiguity"

in that regards, should [ [1], [2], [3] ] to be mapped to an array or a cell?

You gave a valid ambiguity example at the end, but it is just one of many many possibilities. You can add nested brackets to the elements, such as [[]], [[], []] etc, uniformly or a subset of elements. I can not reject these forms because they are all valid JSON (check it at http://jsonlint.com/). In your preference, you can keep the data inside their cell containers, but letting cell2mat to collapse the cell elements into an array makes the data much easier to process.

from jsonlab.

sheljohn avatar sheljohn commented on May 16, 2024

I updated my comment, maybe it is clearer now? I do not have a preference for any container, and turning everything into a cell array is incorrect and not what I would expect a parser to do. I just don't understand why you would say that there is no Matlab-complete reversible mapping with JSON, because there clearly exists one.

[ [1,2], [3,4], [5,6] ] is clearly a cell, because an array of of arrays in Matlab would reduce to an array and therefore arrays of arrays do not exist. Again if that suprises you, it is probably because you are thinking about array-shapes, which don't exist in JSON. It is idiotic to use newlines and tabulations as shape specifiers, because no one would expect a compact or binary representation to allow such characters to be present (and good luck for maintenance too). Instead, it would make sense that shaped-arrays would be translated to JSON objects (whatever the source language), with parameterisable fieldnames (say parameter prefix for instance, because Matlab is so stringent on what a struct-fieldname can be).

Similarly [ [], [], [] ] is a 3-cell, if you meant an empty array you should write [], and there is no array with empty elements in Matlab. Similarly [ [1], [2], [3] ] is a cell, and [1,2,3] is an array.
Is that clearer?

from jsonlab.

sheljohn avatar sheljohn commented on May 16, 2024

The argument that an automated encoder "cannot produce" those counter examples is not strong enough.

If these cases cannot be produced, they should never be encountered, should they? This opinion sounds very subjective to me.

You can't predict/specify what an encoder should output.

Pardon?

For jsonlab, any valid JSON should be accepted.

Maybe that's the point where we disagree; imo the important round-trip to support is starting from Matlab, not JSON (nothing starts from JSON). And that round-trip can absolutely be guaranteed, while also ensuring that any valid JSON would be accepted.

from jsonlab.

sheljohn avatar sheljohn commented on May 16, 2024

Because I consistently found issues with all the parsers I found online so far (I didn't test the ones relying on C++ libraries, but that might be a bit overkill in terms of requirements), I started working on a version of my own, but as I am doing this on the side it will take some time and I am not aiming for performance in the first instance, nor to support unicode. One of the requirements I found so far in terms of design is that parsing needs to be done in two depth-first passes for a correct output to be produced. One cannot be enough.

When I am done though, I would be happy to post a comment here and if you have time to give some feedback that would be great. No worries if you're not interested though.

from jsonlab.

sheljohn avatar sheljohn commented on May 16, 2024

@GeoffreyGamaya I ended up writing my own parser and it works much better than any other one I could find online. Please give it a try if you're interested, the repository is:

https://github.com/Sheljohn/Deck

There are plenty of other things in there, but the parser can be used with:

  • dk.json.read: read JSON file
  • dk.json.decode: parse JSON string
  • dk.json.write: write to JSON file
  • dk.json.encode: return JSON string

from jsonlab.

neogeogre avatar neogeogre commented on May 16, 2024

I made some tests and yours seems better in terms of recovering cell values.
Thx for the link!

One bug : when you have a cell in one column like {1; 2; 3} your parser will return one in a line like {1, 2, 3} .

from jsonlab.

sheljohn avatar sheljohn commented on May 16, 2024

@GeoffreyGamaya You can open an issue in my repo about this, and I can explain at length why it is not possible to strictly resolve array shapes with JSON encoding. If you want to preserve shape, then the right thing to do is to store a structure with size/type/data fields instead. Feel free to open that issue and I can give you a better explanation.

from jsonlab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.