Coder Social home page Coder Social logo

kellyjonbrazil / jello Goto Github PK

View Code? Open in Web Editor NEW
461.0 461.0 19.0 592 KB

CLI tool to filter JSON and JSON Lines data with Python syntax. (Similar to jq)

License: MIT License

Python 99.91% Shell 0.09%
bash bash-scripting cli command-line command-line-interface command-line-tool filter jq json json-lines process python query scripting shell-scripting

jello's People

Contributors

kellyjonbrazil avatar roehling avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jello's Issues

Enhancement: process each JSON Line separately

I'd like to have a switch (say, -L) that would cause jello to evaluate QUERY once per JSON line in the input. I'm not sure if this would fit in with the jello philosophy, but it sure would help me eliminate CPython startup time (and shell boilerplate) while avoiding memory bloat.

I think the JSON Line is a natural chunk size, because it avoids the problem of having to specify the chunk size (cf. ijson's "prefix" handling).

Some contrived examples, in fish shell:

# OLD
for url in $my_data_urls
    curl $url | jello _.haystack.needle
end

# NEW
curl $my_data_urls | jello -L _.haystack.needle
# OLD
find . -type f -name \*.json -print0 | while read -z jsonfile
    cat $jsonfile | jello _.haystack.needle
end

# NEW
find . -type f -name \*.json -exec cat | jello -L _.haystack.needle

Think of it as analogous to Perl's -p switch if that helps.

'<' not supported between instances of 'DotMap' and 'DotMap'

trying sorting ps output I get this error.

❯ ps auxwww | jc --ps | jello 'sorted(_)'
jello:  Query Exception:  TypeError
        '<' not supported between instances of 'DotMap' and 'DotMap'
        query: sorted(_)
        data: [{'user': 'root', 'pid': 1, 'vsz': ... ercent': 0.0, 'mem_percent': 0.0}]

Error messages / exceptions sometimes hard to pinpoint to source (missing line numbers)

I already mentioned this in #57 (reply in thread) and now wanted to back this up with a proper example.

So with most python exceptions occurring within jello I find it hard to pinpoint them to the part of the query source they come from. My main issue is that they don't show any line number.

For example here:

# jc -a | jello "ret=_['version']['not a dict error']; ret"
jello:  Query Exception:  TypeError
        string indices must be integers
        query:  ret=_['version']['not a dict error']; ret
        data:  {'name': 'jc', 'version': '1.23.1', 'description': 'JSON Convert', 'author': 'Kelly Brazil', 'author_email': '[email protected]', 'website':
            'https://github.com/kellyjonbrazil/jc', 'copyrig ... tus` command parser', 'author': 'Kelly Brazil', 'author_email': '[email protected]', 'compatible': ['linux', 'darwin',
            'freebsd'], 'tags': ['command'], 'magic_commands': ['zpool status']}]}

Now this is of course a minimal example with just one line. But consider you have like 10 or 20 lines of code there. Then line numbers in the error message would really help.

Command line option to read query from file (feature request)

A command line option to read queries from an external file might be useful for things you use frequently. Example:

> cat darwin_compatible
result = []
for entry in _.parsers:
  if "darwin" in entry.compatible:
    result.append(entry.name)
result

> jc -a | jello -rl -f darwin_compatible
airport
airport_s
arp
crontab
crontab_u
...

Help with basic usage and syntax errors

All,

I'm trying to something rather basic: extract values from lm_sensors, which I can export as json-formatted data.
However, I keep running into syntax errors I don't understand.

This is a small extract of the json-data:

{
  "aquaero-hid-3-1cb1": {
    "Adapter": "HID adapter",
    "Fan 1 voltage": {
      "in0_input": 4.24
    }
  }
}

jello -s gives me:

_ = {};
_.aquaero-hid-3-1cb4 = {};
_.aquaero-hid-3-1cb4.Adapter = "HID adapter";
_.aquaero-hid-3-1cb4["Fan 1 voltage"] = {};
_.aquaero-hid-3-1cb4["Fan 1 voltage"].in0_input = 4.24;

As expected. So to get the value for the Fan1 voltage, I try:
cat x.json | jello _.aquaero-hid-3-1cb4["Fan 1 voltage"].in0_input and
cat x.json | jello '_.aquaero-hid-3-1cb4["Fan 1 voltage"].in0_input'
That, however, gives:

jello:  Query Exception:  SyntaxError
        invalid syntax (<unknown>, line 1)
        SyntaxError:  _.aquaero-hid-3-1cb4[Fan 1 voltage].in0_input
        query:  _.aquaero-hid-3-1cb4[Fan 1 voltage].in0_input
        data:  {'aquaero-hid-3-1cb4': {'Adapter': 'HID adapter', 'Fan 1 voltage': {'in0_input': 4.24}}}

And now I'm confused. Isn't the result of "jello -s" the concatenation of levels I need to get my result?
The same thing happens for names without spaces in them, btw.
Even trying to get the first level with
cat x.json | jello _.aquaero-hid-3-1cb4
results in a Syntax error.

Thanks!

Edit: After some trial and error, it seem that the "-3-1" is something the python syntax doesn't like. I can escape the spaces, but have not found a way to get around the "-3-1"....

scope issues in comprehensions

Example JSON

{
   "foods": [
      { "name": "carrot" },
      { "name": "banana" }
   ],
   "people": [
      { "name": "alice", "likes": "apples" },
      { "name": "bob", "likes": "banana" },
      { "name": "carrol", "likes": "carrot" },
      { "name": "dave", "likes": "donuts" }
   ]
}

What happens as of 99f6771

$ jello "foods = set(f.name for f in _.foods)
         [p.name for p in _.people if p.likes not in foods]"
jello:  Query Exception:  NameError
        name 'foods' is not defined
        query:  foods = set(f.name for f in _.foods)\n[p.name for p in _.people if p.likes not in foods]
        data:  {'foods': [{'name': 'carrot'}, {'name': 'banana'}], 'people': [{'name': 'alice', 'likes': 'apples'}, {'name': 'bob', 'likes': 'banana'}, {'name': 'carrol', 'likes': 'carrot'}, {'name':
            'dave', 'likes': 'donuts'}]}

$ jello "[p.name for p in _.people if p.likes not in (f.name for f in _.foods)]"
jello:  Query Exception:  NameError
        name '_' is not defined
        query:  [p.name for p in _.people if p.likes not in (f.name for f in _.foods)]
        data:  {'foods': [{'name': 'carrot'}, {'name': 'banana'}], 'people': [{'name': 'alice', 'likes': 'apples'}, {'name': 'bob', 'likes': 'banana'}, {'name': 'carrol', 'likes': 'carrot'}, {'name':
            'dave', 'likes': 'donuts'}]}

What I expected to happen

$ jello "foods = set(f.name for f in _.foods)
         [p.name for p in _.people if p.likes not in foods]"
[
  "alice",
  "dave"
]

$ jello "[p.name for p in _.people if p.likes not in (f.name for f in _.foods)]"
[
  "alice",
  "dave"
]

Suggested change

diff --git a/jello/lib.py b/jello/lib.py
index 60b442c..ed2f4cc 100644
--- a/jello/lib.py
+++ b/jello/lib.py
@@ -459,10 +459,12 @@ def pyquery(_θ_data, _θ_query):
     del _θ_query
     _θ_last = ast.Expression(_θ_block.body.pop().value)    # assumes last node is an expression
 
-    exec(compile(_θ_block, '<string>', mode='exec'))
+    _θ_scope = {'_': _}
+
+    exec(compile(_θ_block, '<string>', mode='exec'), _θ_scope)
     del _θ_block
 
-    _θ_output = eval(compile(_θ_last, '<string>', mode='eval'))
+    _θ_output = eval(compile(_θ_last, '<string>', mode='eval'), _θ_scope)
     del _θ_last
 
     # convert output back to normal dict

Unit test failures

I got one failed test while trying to package jello for Debian:

======================================================================
FAIL: test_dict_html (tests.test_create_json.MyTests)
Test self.dict_sample html output
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/jello-1.4.2/.pybuild/cpython3_3.9_jello/build/tests/test_create_json.py", line 450, in test_dict_html
    self.assertEqual(self.json_out.html_output(output), self.expected)
AssertionError: '<div[72 chars]125%; margin: 0;"><span></span>{\n  <span styl[1171 chars]v>\n' != '<div[72 chars]125%;"><span></span>{\n  <span style="color: #[1160 chars]v>\n'
Diff is 1505 characters long. Set self.maxDiff to None to see it.

----------------------------------------------------------------------
Ran 188 tests in 0.314s

FAILED (failures=1)

It looks like the HtmlFormatter class in Pygments 2.7.1 adds an additional margin: 0 property to the <pre> style.

Streaming support

Enhancement to add streaming support so the entire JSON document doesn't need to be loaded to start processing.

Looks like the ijson library might handle a lot of this. I think I might be able to create a jello -S (for streaming) option that uses the ijson library to parse STDIN and return _ as a generator/iterator of JSON objects - whether it's an array or a top-level of JSON objects.

Support accessing to the dict's values via attributes

THanks @kellyjonbrazil for such a great tool! To me, jello not only has better (pythonic) filtering syntax than jq, but also gets installed via pip not via apt-get.

The problem I faced is that quotes are a problem in many bash and make environments, and the jq-like code doesn't work with jello:

~> echo '{"a": [1,2,3]}' | jello -r _.a
jello:  AttributeError: 'dict' object has no attribute 'a'

~> echo '{"a": [1,2,3]}' | jello -r _["a"]  
zsh: no matches found: _[a]

~> echo '{"a": [1,2,3]}' | jello -r '_["a"]'
[
  1,
  2,
  3
]

So in order to work with jello, I need to use both ' and ", which is impossible in some situations.

I expect the patch would be quick -- just to replace dict with something like AttrDict.

Safety against third-parties.

Hey kelly, greetings.
I just stumbled upon this project when looking for alternatives for JQ.

Is it safe to allow untrusted third parties to send Jello scripts for querying data in our environment? I saw that you can do imports. Can you import (and use) any python module? Would there a way of whitelisting allowed modules? Thanks!

Any idea how to find easily values of (deeply) nested keys, when the keys not always exist?

I have a JSON file like this:

{"details": {
  "sections": [
    {"fields": [
      { "x": "don't", "v": "Yes!" },
      { "x": "don't", "w": "don't" }
    ]},
    {"fields": [
      { "x": "don't", "v": "Indeed!" },
    ]}],
  "other-stuff": "don't"}
}

I need to extract the values to the "v" keys, i.e "Yes!" and "Indeed!" Because the "v" key is not always present, I do something like this:

cat data.json | jello -c '\
res = []
for s in _.details.sections:
    for f in s.fields:
        try:
            res.append(f.v)
        except KeyError:
            pass
res'

This works, but feels pretty cumbersome. Is there a way to gather the values easier then with my approach? Also it would be nice if I didn't need to know how deep my "v" keys are located.

Feature request: support for other input formats (like raw string, yaml, csv,...)

Currently jello is fixed on reading json and json-lines.

Sometimes it would be helpful for me to be optionally able to directly read other formats too. For example yaml, toml, csv, raw strings,...

For many of these there are cli converters available, but sometimes these converters lack options or have complicated dependencies. So it would be nice to have that option integrated in jello.

Suggested option: -p <input format> Pipe input format name.

[Edit:] maybe better -F <input format> because "Pipe" suggests that it would just work for stdin. But it should of course also work for reading from a file with -f.

This could maybe be implemented using python-benedict as suggested in #62 . benedict already has importers for many formats that would make sense in this context: https://github.com/fabiocaccamo/python-benedict#io-methods benedict of course uses several libraries for this and they could of course also be directly called by jello without using benedict for it.

If you don't like this idea because the preferred way is to use commandline converters that should be piped before jello, ok. But then please at least consider adding a raw string input mode that stores the whole pipe input into _ as string. That would allow the user to write python code to further parse/convert the input.

Failure to call `main()` in cli.py:211-212 causes `python -m jello.cli` to be a no-op

Issue:

jello/jello/cli.py

Lines 211 to 212 in 0bba08a

if __name__ == '__main__':
pass

The purpose of making python -m jello.cli work is that it allows for per-interpreter, virtualenv and venv installs to be used by referencing the interpreter.

However due to the use of a pass instead of:

if __name__ == "__main__":
    main()

It is not possible to use jello without letting it stomp into the $PATH

Remediation:

  • replace pass with main()
  • alternatively you can add a __main__.py file and have it call jello.cli.main() after an import jello.cli - that would make python -m jello work which satisfies the Principle Of Least Astonishment

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.