Coder Social home page Coder Social logo

Comments (11)

halaxa avatar halaxa commented on June 11, 2024

TL;DR Just omit the dash at the end.

Hi. Your example works as expected. It seems in your case the JSON Pointer (pointer option) is just not used correctly. The pointer option means "iterate over items in this element". If you only need to iterate over the items in the user-provided-parameters key, just use /user-provided-parameters as the pointer. The dash at the end means "any index" so it matches /user-provided-parameters/0, /user-provided-parameters/1, and so on, and then tries to iterate over what's inside a vector on that index. If you need more explanation, let me know or have a second look at the JSON Machine documentation.

from json-machine.

jakajancar avatar jakajancar commented on June 11, 2024

Thanks for the quick response! You're right.

I tried to reduce the case and did it incorrectly. Let me try again:

Let's say we have a number[][] matrix where we want to iterate through cells, same as:

function cells($matrix) {
    foreach ($matrix as $row) {
        foreach ($row as $cell) {
            yield $cell;
        }
    }
}
$options = ['pointer' => '/table/-'];

Items::fromString('{"table": [[1,2], [3,4]]}', $options);
// Expected: [1,2,3,4]
// Actual: same

Items::fromString('{"table": [[1,2], 3]}', $options);
// Expected: error
// Actual: [1,2,3]

Is this possible?

from json-machine.

jakajancar avatar jakajancar commented on June 11, 2024

And the reason I was using /table/-/- was because then you get nice results in getCurrentJsonPointer():

1  -  /table/0/0
2  -  /table/0/1
3  -  /table/1/0
4  -  /table/1/1

from json-machine.

jakajancar avatar jakajancar commented on June 11, 2024

What are your thoughts on an option "flatten" => false (default true), where of your examples:

JSON Pointer value Will iterate through
(empty string - default) ["this", "array"] or {"a": "this", "b": "object"} will be iterated (main level)
/result/items {"result": {"items": ["this", "array", "will", "be", "iterated"]}}
/0/items [{"items": ["this", "array", "will", "be", "iterated"]}] (supports array indices)
/results/-/status {"results": [{"status": "iterated"}, {"status": "also iterated"}]} (a hyphen as an array index wildcard)
/ (gotcha! - a slash followed by an empty string, see the spec) {"":["this","array","will","be","iterated"]}
/quotes\" {"quotes\"": ["this", "array", "will", "be", "iterated"]}

All of them return a single item, except /results/-/status (with an explicit wildcard) returns the same as today?

from json-machine.

halaxa avatar halaxa commented on June 11, 2024

I'm not sure what the question is now. Can you be more specific?

Anyway, let me just elaborate a little on the flatten topic. JSON Machine supports finding data in a JSON down to a single scalar value if needed. It does that automatically. If it finds a scalar value at a pointer instead of an object or an array, it just yields it in a single iteration. So it might seem it somehow flattens the structure when used in combination with - and when the structure is not rigid. But in reality, no such thing happens.

Try this and you'll see no deep flattening is happening:

$options = ['pointer' => '/table/-'];

Items::fromString('{"table": [[[1,2]], [3,4]]}', $options);
// Expected: [[1,2],3,4]

Also, this example is not expected to produce an error:

$options = ['pointer' => '/table/-'];
Items::fromString('{"table": [[1,2], 3]}', $options);

because at /table/0 there is [1,2] which is sequentially iterated, and at /table/1 there is 3 which is a scalar value and as such it's simply yielded as a single value.

from json-machine.

jakajancar avatar jakajancar commented on June 11, 2024

I would expect a behavior where:

  • For every non-wildcard pointer component:
    • Machine navigates into the object property/array element, and
    • the number of items does not increase.
  • For every wildcard pointer component:
    • Machine explodes the object properties/array elements,
    • the number of the items increases, and
    • the key/index is available using getCurrentJsonPointer().

Currently, even a non-wildcard component explodes the items (but has nowhere to indicate this in the path), if the element pointed to is an object/array. It is this behavior that I would like to have a way to disable.


Below is (yet another) example, which demonstrates both my concerns (indexes in getCurrentJsonPointer() and unpredictable levels).

Say you have two-level array mixed[][], where all of these are valid:

{"2d": [[1,2], [3]]}
    $value['2d'][0][0] (/2d/0/0) = 1
    $value['2d'][0][1] (/2d/0/1) = 2
    $value['2d'][1][0] (/2d/1/0) = 3
{"2d": [[1,2], [3,true]]}
    $value['2d'][0][0] (/2d/0/0) = 1
    $value['2d'][0][1] (/2d/0/1) = 2
    $value['2d'][1][0] (/2d/1/0) = 3
    $value['2d'][1][1] (/2d/1/1) = true
{"2d": [[1,2], [3,[4,5]]]}
    $value['2d'][0][0] (/2d/0/0) = 1
    $value['2d'][0][1] (/2d/0/1) = 2
    $value['2d'][1][0] (/2d/1/0) = 3
    $value['2d'][1][1] (/2d/1/1) = [4,5]

The following is not valid, because it's not really mixed[][]:

{"2d": [[1,2], false]}
    $value['2d'][0][0] (/2d/0/0) = 1
    $value['2d'][0][1] (/2d/0/1) = 2
    $value['2d'][1][0] = error

I would like to

  1. properly get the elements in the valid examples,
  2. know their indexes, and
  3. (ideally) somewhat gracefully handle the invalid example (error or ignore the non-matching value).

This cannot be currently achieved:

  • If you use /2d/-/-
    • ✅ You do get both indices.
    • ❌ Third valid example ([[1,2], [3,[4,5]]]) gets flattened (and you get 5 items)
    • ✅ The invalid example ignores the invalid element.
  • If you use /2d/-:
    • ❌ You do not get both indices, only the first.
    • ✅ Third valid example doesn't get flattened (properly get 4 elements)
    • ❌ The invalid example gets silently ignored (you get same items as first valid example)

from json-machine.

halaxa avatar halaxa commented on June 11, 2024
  • If you use /2d/-/-

    • ✅ You do get both indices.
    • ❌ Third valid example ([[1,2], [3,[4,5]]]) gets flattened (and you get 5 items)
      • That's a feature, not a bug as explained earlier.
    • ✅ The invalid example ignores the invalid element.
  • If you use /2d/-:

    • ❌ You do not get both indices, only the first.
      • Ok, this seems weird. Can you give the exact output? Could it be the same problem as #100?
    • ✅ Third valid example doesn't get flattened (properly get 4 elements)
    • ❌ The invalid example gets silently ignored (you get same items as first valid example)
      • Not-found items get ignored. That's normal behavior. It's as if you wanted the find command to fail on every existing file in the searched dir that does not match searched string.

from json-machine.

halaxa avatar halaxa commented on June 11, 2024

Sorry for being brief ;)

from json-machine.

jakajancar avatar jakajancar commented on June 11, 2024

No worries, I appreciate your responses, responsiveness, and patience with me iterating on trying to get the best example.

  • If you use /2d/-/-

    • ❌ Third valid example ([[1,2], [3,[4,5]]]) gets flattened (and you get 5 items)

      • That's a feature, not a bug as explained earlier.

Yes, I understand. But disabling this feature is essentially my feature request! :D

  • If you use /2d/-:

    • ❌ You do not get both indices, only the first.

I'm not saying that the items do not get iterated over, just that in the getCurrentJsonPointer() return value you don't have both indices (which makes sense, since there is not "placeholder" for them).

  • ❌ The invalid example gets silently ignored (you get same items as first valid example)

    • Not-found items get ignored. That's normal behavior. It's as if you wanted the find command to fail on every existing file in the searched dir that does not match searched string.

By "silently ignored" I don't mean not returned by the iterator (that's what happens with /2d/-/- and that's OK) but returned identically than if it was in a different structure.


Perhaps I owe an explanation for this admittedly weird use-case:

I'm querying OpenAI's text completions AI with the new function calling/structured output mechanism, which returns JSON. JSON Machine is used to return results in a streaming fashion to the user live (see videos here if curious). That table should be string[][] and 95% of the time it is, but occasionally the model hallucinates and omits a level of nesting, adds a level of nesting, returns the wrong number of rows or cells. So when iterating over /2d/-/- I check both the indexes to be monotonically increasing with no gaps, that the values are indeed string, and so on... very defensively.


In recap, I don't think path nr# 2 (/2d/-) is the way forward. /2d/-/- is mostly there, but I would prefer not to have that auto-descent feature.

from json-machine.

halaxa avatar halaxa commented on June 11, 2024

But disabling this feature is essentially my feature request! :D

Now it makes perfect sense 😁. Because in terms of JSON Machine, there's no 'flattening', I'd suggest modifying the scalar parsing logic, which is what's actually behind your problem. Maybe an option something like iterate_scalars, with three settings:

  • AUTO (current behavior, would remain the default)
  • ALWAYS/ONLY/FORCE (an iterable on the pointer position will throw)
  • NEVER (a scalar on the pointer position will throw)

This example of yours:

$options = ['pointer' => '/table/-'];

Items::fromString('{"table": [[1,2], 3]}', $options);
// Expected: error
// Actual: [1,2,3]

would then throw an error with option 'iterate_scalars' => NEVER

from json-machine.

halaxa avatar halaxa commented on June 11, 2024

Also for a less predictable structure maybe #36 would help?

from json-machine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.