Coder Social home page Coder Social logo

plyara / plyara Goto Github PK

View Code? Open in Web Editor NEW
169.0 11.0 35.0 493 KB

Parse YARA rules and operate over them more easily.

Home Page: https://plyara.readthedocs.io/

License: Apache License 2.0

Python 73.91% YARA 26.09%
yara yara-rules parser yara-parser lexer ply python python3 sly

plyara's Introduction

plyara

Build Status

Documentation Status

Code Health

Test Coverage

PyPi Version

Parse YARA rules into a dictionary representation.

Plyara is a script and library that lexes and parses a file consisting of one more YARA rules into a python dictionary representation. The goal of this tool is to make it easier to perform bulk operations or transformations of large sets of YARA rules, such as extracting indicators, updating attributes, and analyzing a corpus. Other applications include linters and dependency checkers.

Plyara leverages the Python module PLY for lexing YARA rules.

This is a community-maintained fork of the original plyara by 8u1a. The "plyara" trademark is used with permission.

Installation

Plyara requires Python 3.6+.

Install with pip:

pip3 install plyara

Usage

Use the plyara Python library in your own applications:

>>> import plyara
>>> parser = plyara.Plyara()
>>> mylist = parser.parse_string('rule MyRule { strings: $a="1" \n condition: false }')
>>>
>>> import pprint
>>> pprint.pprint(mylist)
[{'condition_terms': ['false'],
  'raw_condition': 'condition: false ',
  'raw_strings': 'strings: $a="1" \n ',
  'rule_name': 'MyRule',
  'start_line': 1,
  'stop_line': 2,
  'strings': [{'name': '$a', 'type': 'text', 'value': '1'}]}]
>>>

Or, use the included plyara script from the command line:

$ plyara -h
usage: plyara [-h] [--log] FILE

Parse YARA rules into a dictionary representation.

positional arguments:
  FILE        File containing YARA rules to parse.

optional arguments:
  -h, --help  show this help message and exit
  --log       Enable debug logging to the console.

The command-line tool will print valid JSON output when parsing rules:

$ cat example.yar
rule silent_banker : banker
{
    meta:
        description = "This is just an example"
        thread_level = 3
        in_the_wild = true
    strings:
        $a = {6A 40 68 00 30 00 00 6A 14 8D 91}
        $b = {8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9}
        $c = "UVODFRYSIHLNWPEJXQZAKCBGMT"
    condition:
        $a or $b or $c
}

$ plyara example.yar
[
    {
        "condition_terms": [
            "$a",
            "or",
            "$b",
            "or",
            "$c"
        ],
        "metadata": [
            {
                "description": "This is just an example"
            },
            {
                "thread_level": 3
            },
            {
                "in_the_wild": true
            }
        ],
        "raw_condition": "condition:\n        $a or $b or $c\n",
        "raw_meta": "meta:\n        description = \"This is just an example\"\n        thread_level = 3\n        in_the_wild = true\n    ",
        "raw_strings": "strings:\n        $a = {6A 40 68 00 30 00 00 6A 14 8D 91}\n        $b = {8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9}\n        $c = \"UVODFRYSIHLNWPEJXQZAKCBGMT\"\n    ",
        "rule_name": "silent_banker",
        "start_line": 1,
        "stop_line": 13,
        "strings": [
            {
                "name": "$a",
                "type": "byte",
                "value": "{6A 40 68 00 30 00 00 6A 14 8D 91}"
            },
            {
                "name": "$b",
                "type": "byte",
                "value": "{8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9}"
            },
            {
                "name": "$c",
                "type": "text",
                "value": "UVODFRYSIHLNWPEJXQZAKCBGMT"
            }
        ],
        "tags": [
            "banker"
        ]
    }
]

Reusing The Parser

If you want to reuse a single instance of the parser object for efficiency when parsing large quantities of rule or rulesets, the new clear() method must be used.

rules = list()
parser = plyara.Plyara()

for file in files:
    with open(file, 'r') as fh:
        yararules = parser.parse_string(fh.read())
        rules += yararules
    parser.clear()

Migration

If you used an older version of plyara, and want to migrate to this version, there will be some changes required. Most importantly, the parser object instantiation has changed. It was:

# Old style - don't do this!
import plyara.interp as interp
rules_list = interp.parseString(open('myfile.yar').read())

But is now:

# New style - do this instead!
import plyara
parser = plyara.Plyara()
rules_list = parser.parse_string(open('myfile.yar').read())

The existing parsed keys have stayed the same, and new ones have been added.

When reusing a parser for multiple rules and/or files, be aware that imports are now shared across all rules - if one rule has an import, that import will be added to all rules in your parser object.

Contributing

  • If you find a bug, or would like to see a new feature, Pull Requests and Issues are always welcome.
  • By submitting changes, you agree to release those changes under the terms of the LICENSE.
  • Writing passing unit tests for your changes, while not required, is highly encouraged and appreciated.
  • Please run all code contributions through each of the linters that we use for this project: pycodestyle, pydocstyle, and pyflakes. See the .travis.yml file for exact use. For more information on these linters, please refer to the Python Code Quality Authority: http://meta.pycqa.org/en/latest/

Discussion

  • You may join our IRC channel on irc.freenode.net #plyara

plyara's People

Contributors

8u1a avatar anlutro avatar codacy-badger avatar hillu avatar jselvi avatar malvidin avatar malwarefrank avatar neo23x0 avatar rholloway avatar ronbarrey avatar rshipp avatar ruppde avatar taskr avatar tasssadar avatar utkonos avatar wesinator avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

plyara's Issues

Add to documentation

  • Include at least some sample input/output, and some basic example code.
  • Add a changelog and migration guide somewhere, to ease transitions for anyone who was using the old version.
  • Modify the alabaster settings too, with at least a link to the github repo.

Error excluding comments in condition lines

I noticed that a comment in a condition will be parsed as if it were part of the condition and appears in the condition_terms

rule EXE_extension_cloaking {
   meta:
      description = "Executable showing different extension (Windows default 'hide known extension')"
      author = "Florian Roth"
   condition:
      filename matches /\.txt\.exe$/is or   // Special file extensions
      filename matches /\.pdf\.exe$/is   // Special file extensions
}
>>> rule["condition_terms"]
['filename', 'matches', '/\\.txt\\.exe$/is or   //', 'Special', 'file', 'extensions', 'filename', 'matches', '/\\.pdf\\.exe$/is']

Rule imports pollution

If Yara rule sets include module imports like "pe", these will be propagated to all rules in the set, regardless if used in the conditions.

Consider the following set:

rule sample_1
{
    strings:
        $s = "my_sample_string"
    condition:
        $s
}

import "pe"

rule sample_2
{
    condition:
        pe.number_of_resources > 1
}

After parsing, both rules will include the "pe" import, whereas it should be available only in the sample_2 rule.

Plyara version is 2.0.

Thanks in advance.

Rules Throw Unhandled String Count Condition Error

The following rules throw Unhandled String Count Condition when running utils.generate_logic_hash()

rule test {
         strings:
           $ = /abc/
           $ = /def/
           $ = /ghi/
         condition:
           for any of ($*) : ( for any i in (1..#): (uint8(@[i] - 1) == 0x00) )
       }
rule test {
        strings:
          $a = "ssi"
        condition:
          for all i in (1..#a) : (@a[i] >= 2 and @a[i] <= 5)
      }
rule test {
        strings:
          $a = "ssi"
          $b = "mi"
        condition:
          for all i in (1..#a) : ( for all j in (1..#b) : (@a[i] >= @b[j]))
      }
rule test {
        strings:
          $a = "ssi"
        condition:
          for all i in (1..#a) : (@a[i] == 5)
      }

A string with no rule throws an AttributeError

A string which consists only of imports and/or comments but no rule definitions (which is not uncommon for large collections organized in multiple files) throws AttributeError: 'NoneType' object has no attribute 'type'.

Meta quotes removed upon rebuilding rule

Noticed this today while playing this this library. Upon parsing and rebuilding a rule, the double quotes are removed in values within the meta section.

Example code:

import plyara
ply = plyara.Plyara()

rule = """rule garbageRule
{
	meta:
		author = "Josh Grunzweig"

	strings:
		$garbage = "hi world"

	condition:
		any of them
}"""
p = ply.parse_string(rule)
print(repr(p))
print(ply.rebuild_yara_rule(p[0]))

Output:

[{'condition_terms': ['any', 'of', 'them'], 'raw_strings': 'strings:\n\t\t$garbage = "hi world"\n\n\t', 'raw_condition': 'condition:\n\t\tany of them\n', 'raw_meta': 'meta:\n\t\tauthor = "Josh Grunzweig"\n\n\t', 'rule_name': 'garbageRule', 'stop_line': 11, 'start_line': 1, 'strings': [{'name': '$garbage', 'value': '"hi world"'}], 'metadata': {'author': 'Josh Grunzweig'}}]
rule garbageRule {

	meta:
		author = Josh Grunzweig

	strings:
		$garbage = "hi world"

	condition:
		any of them
}

The unit test also appears to fail on this particular item:

======================================================================
FAIL: test_rebuild_yara_rule_metadata (__main__.TestStaticMethods)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/unit_tests.py", line 82, in test_rebuild_yara_rule_metadata
    self.assertTrue('string_value = "TEST STRING"' in unparsed)
AssertionError: False is not true
----------------------------------------------------------------------

Comments in a BYTESTRING line throw a TypeError

If a BYTESTRING contains a comment, a TypeError: Illegal bytestring character is thrown.
Example rule that causes this:

rule test {
    strings:
    $a = {
        00 01   // this comment breaks plyara
    }
    condition:
    $a
}

Not getting last included file in parser.includes

After parsing a file with only includes, I check the parser.inlcudes attribute and the last item is not there:

>>> import plyara
>>> parser = plyara.Plyara()
>>> parser.parse_string('include "file1.yara"\ninclude "file2.yara"\ninclude "file3.yara"')
[]
>>> parser.includes
['file1.yara', 'file2.yara']

rebuild_yara_rule method does not correctly quote metadata fields

Hey,

In this part of the rebuild yara method there is no accounting for the data type present in the value of each metadata key which means resulting rules have metadata

# Rule Metadata
        if rule.get('metadata'):
            unpacked_meta = [u'\n\t\t{key} = {value}'.format(key=k, value=v)
                             for k, v in rule['metadata'].items()]
            rule_meta = u'\n\tmeta:{}\n'.format(u''.join(unpacked_meta))
        else:
            rule_meta = u''

A lazy fix for this would be to alter it so that we assume every value in the metadata is a string:

if rule.get('metadata'):
            unpacked_meta = [u'\n\t\t{key} = "{value}"'.format(key=k, value=v)
                             for k, v in rule['metadata'].items()]
            rule_meta = u'\n\tmeta:{}\n'.format(u''.join(unpacked_meta))
        else:
            rule_meta = u''

There might be a cleverer way of figuring it out though, since there are probably only 3 cases (int,str,list) that are ever used.

Cheers,
Tom

Update PLY to SLY

dabeaz has rewritten PLY from scratch and modernized everything. The new project is called SLY. It has not hit a stable release yet. However, we should prepare for its stable release and begin our rewrite and refactor soon. More information about SLY on the project page:

https://github.com/dabeaz/sly

Single line comment causes parsing exception

A file containing only a single line comment, such as:

// This rule file has been deleted

Causes the exception:

plyara.exceptions.ParseTypeError: Unknown text / for token of type FORWARDSLASH on line 1

Although this is an edge case, the rule is parsed correctly by Yara itself (and the Python module). It would be good to understand if this is a bug, or if I should continue to work around it with exception handling.

Metadata structure in 2.0.0

Is there a reason why you've decided to change the metadata structure from dict to a list of dicts?

Before:

        "metadata": {
            "MyBool": "true", 
            "MyInt": "10", 
            "MyString": "Test"
        },

Now:

        "metadata": [
            {
                "MyString": "Test"
            },
            {
                "MyInt": 10
            },
            {
                "MyBool": true
            }
        ],

Why not?

        "metadata": {
            "MyBool": true, 
            "MyInt": 10, 
            "MyString": "Test"
        },

For me it makes it more difficult to handle metadata values. I cannot do things like that anymore:

if 'date' in rule['metadata']:
   ...

detect_dependencies() misses references in parantheses

If a rule has a dependency in its condition section which is in parantheses, it will be missed by detect_dependencies().

Example:

rule a {
    condition:
    true
}

rule b {
    condition:
    (a) and true
}

detect_dependendies() will not detect that rule b references rule a. Remove the parantheses around a in b's condition section and it works.
Some open source yara repositories have rules with dependencies in parantheses.

Add more tests

Include enough unit tests that everything is reasonably well tested. It's hard to tell coverage of the ply grammar, but covering at least some standard cases and verifying output is as expected would be beneficial.

It should not be necessary to return a list as metadata for returning python types

I see that most of my scripts are failing to retrieve the metadata keys from parsed yara files because now with v2 it is returning a list.

I think this changes comes from fixing #12

It makes sense to return python types instead of strings, but can't this just be done returning a dictionary as it was done previously?
It makes things much more complicated to locate values from metadata

How it was:

"metadata": {
                "description": "This is just an example",
                "in_the_wild": "true",
                "thread_level": "3"
            }

How it is now:

            "metadata": [
                {
                    "description": "This is just an example"
                },
                {
                    "thread_level": 3
                },
                {
                    "in_the_wild": true
                }
            ],

How I think it should be:

"metadata": {
                "description": "This is just an example",
                "in_the_wild": true,
                "thread_level": 3
            }

Modifying/removing all linked condition terms using current form ?

Hi,

I wanted to write a converter that will take rules that use a superset addition of YARA (such as VT hunting syntax) and convert to a local-only rule that works in regular YARA.

To do this, I need to remove the conditions that don't work in regular YARA.
The way plyara currently parses and structures condition terms makes it difficult to do this, because each individual element is separated individually, and there is no link of related/dependent condition terms (e.g. booleans and new_file, signatures contains "blah").

Not sure how to represent this without using a graph/tree structure, but I think it would make more sense to parse dependent conditions together, such as the case of X contains "y"

Thoughts?

Thanks,

rebuild_yara_rule does not return rule comments

Hi. The method rebuild_yara_rule() seems to be striping off rule comments.
For example, the following rule:

rule sample
{
    strings:
        $ = { 01 02 03 04 } // string 1
        $ = { 01 02 03 05 } // string 2
    condition:
        all of ($) // condition
}

Is correctly parsed (including comments):

[{'condition_terms': ['all', 'of', '(', '$', ')'], 'raw_strings': 'strings:\n        $ = { 01 02 03 04 } // string 1\n        $ = { 01 02 03 05 } // string 2\n    ', 'raw_condition': 'condition:\n        all of ($) // condition\n', 'comments': ['// condition', '// string 2', '// string 1'], 'rule_name': 'sample', 'stop_line': 8, 'start_line': 1, 'strings': [{'name': '$', 'value': '{ 01 02 03 04 }'}, {'name': '$', 'value': '{ 01 02 03 05 }'}]}]

But then rebuild_yara_rule() ignores comments:

rule sample {
        strings:
                $ = { 01 02 03 04 }
                $ = { 01 02 03 05 }
        condition:
                all of ($)
} 

Is this intentional? On the one hand this is not an issue if the rebuild rule is passed to yara for scanning. On the other hand, if you are rebuilding a rule for intel sharing, the rebuilder should preserve the rule comments.

Thanks in advance!
RD

Offset char @ throw a ParseTypeError

Rule:

rule sample
{
    strings:
        $ = { 01 02 03 04 }
    condition:
        for all of ($) : ( @ < 0xFF )
}

Throws:

plyara.ParseTypeError: Illegal character @ at line 6

Thanks in advance!
RD

New xor modifier causes parsing failure

Version 3.11 of Yara brings a more flexible xor modifier (see changelog).

This means that for versions above 3.11 the xor modifier can also be written like xor(0x01-0xff) to restrict the range.

A simple rule that fails to parse is:

rule xor_test {
    strings:
        $ = "test test test" xor(0x01-0x02)
    condition:
        any of them
}

It's possible this rule breaks the internal state of the parser, because subsequent rules do not seem to correctly parse. However, I have not verified this yet.

ParseValueError's t.lexer.lineno ignores bytestring new lines

The following rule:

rule sample
{
    strings:
        $ = { 00 00 00 00 00 00  
              00 00 00 00 00 00 } //line 5
    conditio: //fault
        all of them
}

Throws:
Unknown text conditio for token of type ID on line 5

Ideally it should consider the bytestring new line and return the exception at line 6.

Thanks in advance.
RD

Single JSON Output Schema

After making a release announcement on Twitter, the maintainers of YARA joined the conversation and mentioned that they're writing a Go implementation of the YARA parser that will provide JSON output as an option. Also, in the same thread, I learned that there is already a Go implementation here:
https://github.com/Northern-Lights/yara-parser

I've opened an issue on their repo for this same topic here:
Northern-Lights/yara-parser#17

Here is my proposal:
Let's coordinate on one single schema for data structure and JSON output format. We can definitely have local variation, but I think having a single schema that is interoperable among all three projects is a good thing. As a first step, I can post an annotated copy of our full JSON schema along with the reasoning behind various decisions. The short term goal would be to have both annotated schemas sent over to the core YARA developers. An ideal situation would be that core adopts as much of our "unified" schema as makes sense. They would then release the official schema when ready. We would then produce JSON that conforms to that official schema. If there are fields that we can't all agree on, we would then have a flag to enable additional local/optional fields in our output.

Error parsing Regex strings in newest v1.2.6

Up to now I was working with version 1.2.5, which worked fine.
The newest master produces errors with many of my regex YARA strings.

Traceback (most recent call last):
  File "mjolnir.py", line 166, in readFiles
    rulesList = p.parse_string(fileData)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/plyara-1.2.6-py3.6.egg/plyara.py", line 210, in parse_string
    yacc.parse(input_string)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ply-3.11-py3.6.egg/ply/yacc.py", line 333, in parse
    return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ply-3.11-py3.6.egg/ply/yacc.py", line 1201, in parseopt_notrack
    tok = call_errorfunc(self.errorfunc, errtoken, self)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ply-3.11-py3.6.egg/ply/yacc.py", line 192, in call_errorfunc
    r = errorfunc(token)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/plyara-1.2.6-py3.6.egg/plyara.py", line 1087, in p_error
    raise TypeError(u'Unknown text {} for token of type {} on line {}'.format(p.value, p.type, p.lineno))
TypeError: Unknown text [ for token of type LBRACK on line 78

The parser fails on regular expression strings like this

      $d = /\x00https?:\/\/[^\x00]{4,500}\x00\x00\x00/

I guess that this commit introduced the errors:
0c4e814

Bug in regex string lexing

Regex strings with no space before the modifier are valid in YARA, but cause Plyara to choke.

Example:

$a = /rex/nocase

Need to fix the t_REXSTRING regex to handle this along with the existing cases of /rex/ismx and /rex/ nocase.

Rewrite Rule Rebuilding Function

This function does not work on many rules and rulesets. This capability needs to be reworked or removed. It's currently deprecated, but can avoid removal if it can be fixed.

Imports have extra quotes

In the output object, imports have quotes included. Not sure this is expected or desired behavior. Probably want to get rid of the quotes.

        "imports": [
            "\"bango\"", 
            "\"bingo\""
        ], 

Extra quotes added when rebuilding rule

Hi,

When parsing a rule with a specified, but empty description field, like this:

rule sample
{
    meta:
        description = ""
        author = "me"
    strings:
        $ = { 01 02 03 04 } // string 1
        $ = { 01 02 03 05 } // string 2
    condition:
        all of ($) // condition
}

plyara.Plyara.rebuild_yara_rule returns too many quotes, like this:

rule sample
{
    meta:
       description = """"
        author="me"
    strings:
        $ = { 01 02 03 04 } // string 1
        $ = { 01 02 03 05 } // string 2
    condition:
        all of ($) // condition
}

I'm running version 2.0.1

Reusing parser causes rules to show up that are not present

I am using plyara to parse different rules and tracking statistics about them, but when I reuse the parser then the rules from the previous parse_string() call show up in the results of the subsequent unrelated parse_string() calls.

I can see this being useful in some cases, such as incrementally adding rules, but it is counter-intuitive to me that parse_string() is not stateless. I suggest adding an option to make it stateless.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.