salspaugh / splparser Goto Github PK
View Code? Open in Web Editor NEWSimple parser for Splunk Processing Language (SPL) written in Python.
License: Other
Simple parser for Splunk Processing Language (SPL) written in Python.
License: Other
eg.
fillnull value="0" freeSpace, capacity, maxFileSize | fillnull value="N/A"
Background: LALR tables are required for parsing. Because construction of these tables is expensive, PLY writes them to a file the first time it builds them and references that on subsequent parser calls. By default this file is named ./parsetab.py. In splparser, the parser for each command writes its parse table to a file named ./_parsetab.py.
THE ISSUE WITH THIS is that any directory from which you call the parser becomes littered with dozens of such files. Also, bad but somewhat less important is that any time you move to a new directory, the expense of table creation is incurred again.
Proposed solution: Write all LALR table files to a single location in the file system (probably configured upon installation), for example, /etc/splparser/parsetabs/_parsetab.py.
A recent policy is that when parsing queries that do not match the documentation, we still continue to "accept" them by having a valid parser rule for them (within reason -- we shouldn't accept 'cat on the keyboard'-like garbage). BUT in addition to accepting them, we add set an "errror" flag on the relevant ParseTreeNode containing the erroneous part of the query to "true" i.e., p[0].error = true.
In order for this to be useful we need someway of indicating that the entire parse tree is from an erroneous query.
Viable options (not mutually exlusive) for this include:
When the parsetabs are created you might notice warnings like:
WARNING: 53 shift/reduce conflicts
WARNING: Token 'OUTPUTNEW' defined, but not used
WARNING: Token 'ASUC' defined, but not used
WARNING: Token 'ASLC' defined, but not used
WARNING: Token 'OUTPUT' defined, but not used
WARNING: /Users/salspaugh/splparser/splparser/rules/common/valuerules.py:5: Rule 'value' defined, but not used
WARNING: There are 4 unused tokens
WARNING: There is 1 unused rule
WARNING: Symbol 'value' is unreachable
Two things should be done:
(1) Change the logging so that the warnings get written to the logs and not to stderr, so that we can identify which command is generating which warning.
(2) Fix all warnings except those dealing with shift/reduce conflicts.
@richzeng Since it looks like the dedup command is responsible for some of these warnings, do you want to take a stab at this? Another command you wrote might also be doing this since it's giving the same warnings, but since the warnings just get written to stderr without any indication which command is causing them, it's hard to be sure. The second part should be really easy to address and is really just a code cleanliness thing. The first part shouldn't be hard either, and is a usefulness thing. The question is whether you pass the debug / logging info to the parser when it's created or to the parse command or etc. Depending on which it is, it will involve changes to splparser/parser.py
or splparser/decorators.py
.
The way most tests are structured right now causes errors to cascade to later tests. eg.
>>> parsed = splparser.parser.parse("blahblah1")
>>> parsed.print_tree()
# tree prints
>>> parsed = splparser.parser.parse("blahblah2")
# ERROR
>>> parsed.print_tree()
# prints the tree for blahblah1
We should fix this buy just changing the test cases to say
>>> splparser.parser.parse("blahblah1").print_tree()
See LexError for example of field you might need to print.
eg: |history | head 2000 | search event_count>0 OR result_count>0 | dedup search | table search
eg.
search set_sos_index
host="SPLUNK1.EDM.LOCAL" sourcetype="lsof"\n | head 1\n | multikv \n | get_splunk_process_type_lsof
Steps to Reproduce:
import splparser.parser
EVAL xrefs = mvjoin(mvzip('plugin.xrefs{}.type', 'plugin.xrefs{}.id', " #"), ",")
eg.
search audit failure | fields + Message
Most tests have been changed from:
parsed = splparser.parser.parse("search foo")
parsed.print_tree()
to:
splparser.parser.parse("search foo").print_tree()
Because the former way causes errors to cascade. Rex tests need to be changed to look like the latter to address this bug too.
Error messages should go to stderr.
Self-explanatory.
Should be straightforward.
Otherwise this is a possible bug.
eg.
'tstats max(time) FROM datamodel=Web'
even this simple one.
The output of the search command when there is an argument like key=value is sometimes:
('EQ')
('KEY')
('VALUE')
and other times:
('KEY')
('VALUE')
This should be consistent across commands.
Self-explanatory from title.
Problem: Although each command sets up logging and specifies a file of the name parser.log, these log files are not written to. It appears that all debugging messages get logged to splparser.log instead.
Possible solutions:
(1) Each command should have their own logging: Fix it so that each command writes to their own log file. In this case, choose a single place in the filesystem where all command logging is written to (configured at installation time).
(2) All commands share logging: Although it's not clear why this works, it's maybe acceptable for all commands to write to splparser.log. In this case, all command parsers can have the useless duplicate code that (fails to) set up logging correctly removed from their modules.
eg.
| loadjob rt_scheduler__dscolandsa_U3BsdW5rX2Zvcl9BY3RpdmVEaXJlY3Rvcnk__RMD5771d74ce16fe2d13_at_1393942723_1106.1 | head 1 | tail 1| search search=""
we only have field+optlist but not optlist+field
eg.
search eventtype=msad-successful-computer-logons user="$" dest_nt_domain="EDM"|table _time,host,src_ip|dedup consecutive=T src_ip|lookup SiteInfo host|table _time,src_ip,Site
Also add an option to run_tests.py to exclude unimplemented commands.
Right now they are all really slow, probably because they each spend a couple seconds importing the parser, which is really slow.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.