salspaugh / splparser Goto Github PK

View Code? Open in Web Editor NEW

31.0 31.0 16.0 1.07 MB

Simple parser for Splunk Processing Language (SPL) written in Python.

License: Other

Shell 0.13% Python 99.87%

splparser's People

Contributors

Stargazers

Watchers

Forkers

stevedh yangzb09 linearregression etsangsplk bjianhang swordfisher mithrilwoodrat gschmidtee xycloud lowell80 sallelujah duanshuaimin jon-athon jasonli-cn rainmana k3rbyte

splparser's Issues

fillnull should work when there's only value field

eg.
fillnull value="0" freeSpace, capacity, maxFileSize | fillnull value="N/A"

Parser litters directories from which it's called with LALR table files named <cmd>_parsetab.py.

Background: LALR tables are required for parsing. Because construction of these tables is expensive, PLY writes them to a file the first time it builds them and references that on subsequent parser calls. By default this file is named ./parsetab.py. In splparser, the parser for each command writes its parse table to a file named ./_parsetab.py.

THE ISSUE WITH THIS is that any directory from which you call the parser becomes littered with dozens of such files. Also, bad but somewhat less important is that any time you move to a new directory, the expense of table creation is incurred again.

Proposed solution: Write all LALR table files to a single location in the file system (probably configured upon installation), for example, /etc/splparser/parsetabs/_parsetab.py.

The parsetree module should make accessing of erroneous ParseTreeNodes accessible.

A recent policy is that when parsing queries that do not match the documentation, we still continue to "accept" them by having a valid parser rule for them (within reason -- we shouldn't accept 'cat on the keyboard'-like garbage). BUT in addition to accepting them, we add set an "errror" flag on the relevant ParseTreeNode containing the erroneous part of the query to "true" i.e., p[0].error = true.

In order for this to be useful we need someway of indicating that the entire parse tree is from an erroneous query.

Viable options (not mutually exlusive) for this include:

Propagate the error up to the root, and have it set an attribute as flag like "self.haserrors = True", and possibly keep a pointer to the erroneous children.
Add a function to return true when a node or its subtree contains and error, and a function for printing out the error.

Warnings printed out when parsetabs created need to be addressed.

When the parsetabs are created you might notice warnings like:

WARNING: 53 shift/reduce conflicts
WARNING: Token 'OUTPUTNEW' defined, but not used
WARNING: Token 'ASUC' defined, but not used
WARNING: Token 'ASLC' defined, but not used
WARNING: Token 'OUTPUT' defined, but not used
WARNING: /Users/salspaugh/splparser/splparser/rules/common/valuerules.py:5: Rule 'value' defined, but not used
WARNING: There are 4 unused tokens
WARNING: There is 1 unused rule
WARNING: Symbol 'value' is unreachable

Two things should be done:
(1) Change the logging so that the warnings get written to the logs and not to stderr, so that we can identify which command is generating which warning.
(2) Fix all warnings except those dealing with shift/reduce conflicts.

@richzeng Since it looks like the dedup command is responsible for some of these warnings, do you want to take a stab at this? Another command you wrote might also be doing this since it's giving the same warnings, but since the warnings just get written to stderr without any indication which command is causing them, it's hard to be sure. The second part should be really easy to address and is really just a code cleanliness thing. The first part shouldn't be hard either, and is a usefulness thing. The question is whether you pass the debug / logging info to the parser when it's created or to the parse command or etc. Depending on which it is, it will involve changes to splparser/parser.py or splparser/decorators.py.

Failed test cases cascade when there's an error

The way most tests are structured right now causes errors to cascade to later tests. eg.

>>> parsed = splparser.parser.parse("blahblah1")
>>> parsed.print_tree()
# tree prints
>>> parsed = splparser.parser.parse("blahblah2")
# ERROR
>>> parsed.print_tree()
# prints the tree for blahblah1

We should fix this buy just changing the test cases to say

>>> splparser.parser.parse("blahblah1").print_tree()

Fix error handling in splparser/decorators.py: print e.args or e.message, whichever is not ""

See LexError for example of field you might need to print.

Reconsider the placement of groupby's in commands with groupby's

fix bug: when search is the parameter of other commands

multikv command can have no fields; we should omit new line (\n)

eg.
search set_sos_index host="SPLUNK1.EDM.LOCAL" sourcetype="lsof"\n | head 1\n | multikv \n | get_splunk_process_type_lsof

Can not parse "," value in EVAL function argument

Steps to Reproduce:

import splparser.parser
EVAL xrefs = mvjoin(mvzip('plugin.xrefs{}.type', 'plugin.xrefs{}.id', " #"), ",")

audit is a command but it appears as field in after search command

eg.
search audit failure | fields + Message

Fix rex tests so that they don't cause a cascading error.

Most tests have been changed from:

parsed = splparser.parser.parse("search foo")
parsed.print_tree()

to:

splparser.parser.parse("search foo").print_tree()

Because the former way causes errors to cascade. Rex tests need to be changed to look like the latter to address this bug too.

Error messages get written to stdout instead of stderr.

Error messages should go to stderr.

run_tests.py should throw an error when the specified module doesn't exist.

Implement tests for macros and user-defined commands

Self-explanatory.

Use classes to encapsulate parsers.

Should be straightforward.

Make sure all option regexes in search regexes have proper ending token

Otherwise this is a possible bug.

tstat is not working!

eg.

'tstats max(time) FROM datamodel=Web'

even this simple one.

Inconsistent search command output

The output of the search command when there is an argument like key=value is sometimes:

('EQ')
('KEY')
('VALUE')

and other times:
('KEY')
('VALUE')

This should be consistent across commands.

Refactor code so that there are not so many near-duplicate files in test/splparser/rules/

Add a -e (--exclude) flag to run_tests.py which will exclude the tests for whichever modules are listed after the flag.

Self-explanatory from title.

Individual command logging is not working.

Problem: Although each command sets up logging and specifies a file of the name parser.log, these log files are not written to. It appears that all debugging messages get logged to splparser.log instead.

Possible solutions:
(1) Each command should have their own logging: Fix it so that each command writes to their own log file. In this case, choose a single place in the filesystem where all command logging is written to (configured at installation time).
(2) All commands share logging: Although it's not clear why this works, it's maybe acceptable for all commands to write to splparser.log. In this case, all command parsers can have the useless duplicate code that (fails to) set up logging correctly removed from their modules.