Coder Social home page Coder Social logo

eql's People

Contributors

brokensound77 avatar cavokz avatar dcode avatar eric-forte-elastic avatar itsnotapt avatar mikaayenson avatar rw-access avatar zackpayton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eql's Issues

Filter Request: limit

Request an additional filter type of limit to limit the number of records returned from a query.

$ eql query -f my-sysmon-data.json 'ParentImage != null | limit 1' | jq

This would be comparable to the LIMIT keyword in SQL, returning only the top N records returned by the query. This is useful for getting a handle on the structure and format of the data that the analyst is working with.

Optimize wildcard(x, ...) or wildcard(x, ...)

When building an OR of two wildcard checks of the same field, there are two separate function calls. But it would be more optimal for many backends if these were grouped up. The AST for wildcard(x, "a", "b", "c") or wildcard(x, "d", "e", "f") could be rewritten to wildcard(x, "a", "b", "c", "d", "e", "f")

Here's the current behavior:

$ python -m eql optimize 'a == "foo*" or a == "*bar*" or a == "*b*a*z*"'
a == "foo*" or a == "*bar*" or a == "*b*a*z*"

Expected:

$ python -m eql optimize 'a == "foo*" or a == "*bar*" or a == "*b*a*z*"'
wildcard(a, "foo*", "*bar*", "*b*a*z*")

Uncaught exception for macros that return fields

Describe the bug

There's an uncaught exception when macros return fields.

To Reproduce

Steps to reproduce the behavior:

  1. Make a macro that returns a field (e.g. macro SELF(a))
  2. Use the macro in a query foo where SELF(bar) == "baz"
  3. Note the traceback

Expected behavior

Should successfully return a type hint instead.

Screenshots

Traceback (most recent call last):
  ...

  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 1262, in parse_analytics
    return [parse_analytic(r, preprocessor=preprocessor, **kwargs) for r in analytics]
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 1247, in parse_analytic
    query = parse_query(text, preprocessor=preprocessor, **kwargs)
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 1198, in parse_query
    subqueries=subqueries, pipes=pipes)
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 1134, in _parse
    eql_node = walker.visit(tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 275, in visit
    return Interpreter.visit(self, tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/lark/visitors.py", line 279, in visit
    return f(tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 861, in piped_query
    first = self.visit(node["base_query"])
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 275, in visit
    return Interpreter.visit(self, tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/lark/visitors.py", line 279, in visit
    return f(tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 856, in base_query
    return self.visit(node.children[0])
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 275, in visit
    return Interpreter.visit(self, tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/lark/visitors.py", line 279, in visit
    return f(tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 827, in event_query
    node_info = self.visit(node.children[-1])  # type: NodeInfo
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 275, in visit
    return Interpreter.visit(self, tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/lark/visitors.py", line 279, in visit
    return f(tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 662, in and_expr
    return self.bool_expr(node, ast.And)
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 650, in bool_expr
    terms = self.visit_children(node)  # type: list[NodeInfo]
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 265, in visit_children
    return Interpreter.visit_children(self, tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/lark/visitors.py", line 283, in visit_children
    for child in tree.children]
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 275, in visit
    return Interpreter.visit(self, tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/lark/visitors.py", line 279, in visit
    return f(tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 666, in or_expr
    return self.bool_expr(node, ast.Or)
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 650, in bool_expr
    terms = self.visit_children(node)  # type: list[NodeInfo]
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 265, in visit_children
    return Interpreter.visit_children(self, tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/lark/visitors.py", line 283, in visit_children
    for child in tree.children]
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 275, in visit
    return Interpreter.visit(self, tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/lark/visitors.py", line 279, in visit
    return f(tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 566, in comparison
    left, comp_op, right = self.visit_children(node)  # type: (NodeInfo, str, NodeInfo)
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 265, in visit_children
    return Interpreter.visit_children(self, tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/lark/visitors.py", line 283, in visit_children
    for child in tree.children]
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 275, in visit
    return Interpreter.visit(self, tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/lark/visitors.py", line 279, in visit
    return f(tree)
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 752, in function_call
    type_hint = self._get_type_hint(node, expanded)
  File "<python root directory>/venv/lib/python2.7/site-packages/eql/parser.py", line 722, in _get_type_hint
    ast_node, type_hint = self._update_field_info(NodeInfo(ast_node, source=node))
TypeError: 'NodeInfo' object is not iterable

Bump version to 0.9.13 and tag release

With the Lark upgrade, would like to bump a new release so downstream projects can leverage the updated dependency.

Looks like the current version isn't tagged on GitHub, it would be nice to tag a release and push to pypi.

[FR] Add support for sequence alias under elastic_endpoint_syntax

New Feature Description

Recently, alias sequences merged into endpoint-dev EQL, which will require updates to support this.

Ex.

sequence by user.name
[process where process.name == "cmd.exe"] as parentEvent
[network where parentEvent.process.id == process.id]

Example of a detection use case of process A running as user1 spawns a child process running as a different user:

sequence with maxspan=1m
 [process where event.action == "start" and user.id : "S-1-5-21*"] as event0 by process.entity_id
 [process where event.action == "start" and event0.user.id  != user.id ] by process.parent.entity_id

Backport changes from Elasticsearch EQL

Context

EQL is being developed directly in Elasticsearch and can be tracked here: elastic/elasticsearch#51556

We've identified places where we need to tighten the semantics for EQL, make changes to existing behavior, or limit what can be expressed in the language. This meta issue is to track all the changes we need to back port to resolve incongruities between Endpoint and Elasticsearch EQL. This will help prepare users of EQL and Elastic endpoint security to make migration easier.

Changes will be tracked in the feature/backport branch.

Parser and validation updates

  • add support for `backtick` identifiers #19
  • require explicit boolean behaviors: no auto-casting of bools #18
  • allow null to be compared to anything. currently, null == 5 is not allowed #18
  • TBD: multi-valued/array field type validation
  • use Elasticsearch time units for sequences, and remove float intervals #23
    elastic/elasticsearch#54760

Runtime updates

  • function behavior: always propagate nulls from required arguments #18
  • correct handling of three-value boolean logic (true, false, null) #18
  • toggle-able case-sensitivity. right now, case-insensitivity is always on. we should add an option to turn this off, while preserving the default behavior. this parameter could be set by updating the config set to the parser or inspecting the rule metadata.
  • TBD: multi-valued functions (will make arraySearch and arrayContains redundant). this isn't actually supported yet within ES, which only uses scalar values for painless

To be determined

  • ES allows multi-valued fields to be indexed and queried, but within painless we may get one member back instead of the full array (need to investigate. see elastic/elasticsearch#54970)
  • if so, we should allow == and other comparisons to check array membership

Test Suite

  • As changes are made, the test_queries.toml file should be updated
  • We should update the test suite to externalize multiple tests
    • syntax and parsing
    • semantics, type checking, verifying, etc
    • folding
    • run-time function evaluation
    • integration testing with query and result validation

cc @colings86 @costin @paulewing

Use non-capturing groups for regular expressions

Some regular expressions use matching groups, but these are unnecessary and wasteful.
Instead, we should prefer non-matching groups with ?:.

A few affected places

  • match
  • matchLite
  • cidrMatch -- the regular expressions that it generates

Add support for repeated sequences under elasticsearch_syntax

New Feature Description

Recently, repeated sequences merged into elasticsearch EQL, which will require updates to support this.

Ex:

sequence
  [process where opcode == 1] by unique_pid
  [file where opcode == 0] by unique_pid with runs=2
  [network where opcode == 0] by unique_pid

Query shorthand

New Feature Description

Add short hand for queries, so lengthy query templates can be reused.
This could be accomplished with a custom definition. The name filter is up for debate, and does currently collide with a filter pipe.

filter services_child = process where subtype.create and parent_process_name == "services.exe"

you've created a new event type, and can use it:

services_child where process_name == "cmd.exe"

Source text missing from errors

Describe the bug

Semantic errors sometimes miss source text.

To Reproduce

$ eql query 'sequence
[process where true]
[file where  badFunction()]
[network where true]
[process where true]
[network where true]
'
Error at line:3,column:14
Unknown function badFunction

^^^^^^^^^^^

Expected behavior

$ python -m eql query 'sequence
[process where true]
[file where  badFunction()]
[network where true]
[process where true]
[network where true]
'

Error at line:3,column:14
Unknown function badFunction
sequence
[process where true]
[file where  badFunction()]
             ^^^^^^^^^^^
[network where true]
[process where true]
[network where true]

Changelog

It seems that packages are being deployed to pypi but there is no changelog, release notes, or tags in git for releases. Is it possible to have details on what has changed in each release when it happens?

Update syntax for colon operator

On the Elasticsearch side, the : syntax has been updated. We should update the syntax for : here. Currently, this syntax is only recognized when opting-in to the Elasticsearch grammar. See #40

On the right side of ::

  • supports wildcards
  • supports a list
  • does not support arbitrary expressions
  • syntactically accepts any constant, but semantically it only allows strings. This makes error messages slightly better.

Implement SQL-consistent null and boolean handling

Part of #15

We need to implement three-value logic for null, true and false consistent with Elasticsearch EQL and Elasticsearch SQL.

Some necessary changes

  • Any value can be compared to null
  • The only way to explicitly check for null is == null or != null (plus commutative versions)
    • Likely need an intermediate ast node to delineate these nulls from folded nulls
  • All other comparisons to null return null (< null, <= null, etc)
  • Dynamic nulls can not be compared. process_name == parent_process_name will be null even if both values are null. This is a change from the current implementation
  • Implement boolean logic similarly, while maintaining the short-circuiting properties: ... and false => false, and ... or true => true
  • Disable implicit boolean casting. We can keep a toggle to manually turn it back on
  • Functions with any null required inputs return null
  • Might be worthwhile to track functions that can return null, even when all parameters are the correct-type and non-null

And of course, update existing tests to be consistent

[FR] Add IPv6 Support to CidrMatch

Summary

We would like to add support for IPv6 to the CidrMatch function to better enable usage of the eql library to evaluate queries that use CidrMatch with IPv6 addresses to better match the features of Elasticsearch.

For more implementation specific details please see the PR.

Add feature flag for Elasticsearch parsing

Add a feature flag to accept the syntax for the Elasticsearch changes to EQL.
Leave the runtime behavior as-is for now, but we should be able to validate with the Elasticsearch 7.10 changes. This'll be necessary for elastic/detection-rules unit testing.

Changes to syntax with this flag:

  • = not allowed with ==
  • ?" and ?' not allowed
  • : is allowed
  • """ allowed.

We could also use the toggle to be extra strict and disable additional functions, ancestry, etc.

FYI @brokensound77

Remove unsupported sequence parsing

Part of #15

Per elastic/elasticsearch#55032, some sequence behavior is no longer supported and some changes need to be backported. This should be done in a way that still recognizes previous syntax but has custom error messages, to ease the migration path.

  • non-integer time units should be removed
  • time units should always be required
  • non-ES time intervals shouldn't be supported. the units should be stricter as well (e.g. accepting h but no longer hours)
  • undocumented fork must be a boolean, instead of the int -> bool coercion that was done before

Remove case_sensitive argument from between function

Yes, we can remove the case sensitive parameter from this function.

The function uses case sensitivity when finding the matching substrings left and right within source. But it always returns the the substring between those in its original case. If we remove that parameter, we'll have to figure out how to perform case-(in)sensitive matching while returning the original string. It's not as simple as between(lower(source), lower(left), lower(right)).

When greedy=false, that makes the function roughly equivalent to

substring(source, // notice the original casing
          indexOf(lower(source), lower(left)),
          indexOf(lower(source), lower(right), indexOf(lower(source), lower(right), lower(left)))
          )

Originally posted by @rw-access in elastic/elasticsearch#54411 (comment)

Extend attributes for arguments in macros

There are cases where we want to get subfields from within a macro, to make things more flexible and composable.
One example:

macro IS_TRUSTED(authenticode)
   nested_authenticodeauth_info.signature_signer == "Microsoft Corportation" and authenticode.signature_status == "trusted"

This will make it easier to do things like

IS_TRUSTED(events[0])
IS_TRUSTED(data.triggering_fact_array.events[5].data_buffer)

IS_TRUSTED(SOURCE_PROCESS())

[FR] Python Engine Unit Test Missing for Core Features

Summary

None of the core features (see changelog) have unit tests to test the evaluation works as expected with the PythonEngine. We should add more unit tests to ensure features have small sample data loaded and evaluate as expected.

[Bug] 'as' keyword used in Autonomous System Fields.

Describe the bug

The new as keyword introduced in 0.9.14 overlaps with the ecs as fields and causes an error if events (e.g. client.as.organization.name.text) use as fields.

To Reproduce

Run eql.parse_query with any query that has as fields.

Expected behavior

This edge case for keywords should not throw an Invalid use of keyword error.

Remove forced floating point divisions

Relates to #15

Currently in EQL, all divisions are forced to be floating-point divisions, with no way to opt-out. Instead, if both the dividend and divisor are integers, we should use integer division. We should follow the behavior 3 / 2 ==> 1 instead of 3 / 2 ==> 1.5.

To continue using, floating point division, make either the dividend or divisor a float/double.

We'll also need to update the tests for changing this behavior.

Separate out the optimizer from the AST

For better practice, the optimize step should be separated from converting the Lark tree to the EQL tree. Instead of optimizing along the way, it should be completed as a separate phase.

With this, we can also disable optimizations entirely. That will enable better testing, manipulation and conversion on the unmodified tree.

Simplify case_sensitive/case_insensitive flags for TOML

Based on some feedback, the tests should be simplified.

Every test must specify case_sensitive=true, case_insensitive=true or both.

Currently, tests that support both do not specify, but this implicit behavior is a little backwards. Instead, we should always be explicit. Tests that work for both will look like:

case_sensitive = true
case_insensitive = true

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.