Coder Social home page Coder Social logo

yaramod's Introduction

yaramod

Travis CI build status AppVeyor build status Documentation Status

yaramod is a library that provides parsing of YARA rules into AST and a C++ programming interface to build new YARA rulesets. This project is not associated with the YARA project.

yaramod also comes with Python bindings and this repository should be fully compatible with installation using pip.

User Documentation

You can find our documentation on Read the Docs.

API Documentation

You can generate the API documentation by yourself. Pass -DYARAMOD_DOCS=ON to cmake and run make doc.

License

Copyright (c) 2017 Avast Software, licensed under the MIT license. See the LICENSE file for more details.

yaramod uses third-party libraries or other resources listed, along with their licenses, in the LICENSE-THIRD-PARTY file.

Contributing

See RetDec contribution guidelines.

yaramod's People

Contributors

anetakvapilova avatar bzeba avatar catap avatar houndthe avatar hyuunnn avatar matejkastak avatar metthal avatar mienkofax avatar msm-code avatar petermatula avatar s3rvac avatar stepanek-m avatar tadeaskucera avatar tasssadar avatar tomaskender avatar vojone avatar wayrick avatar wesinator avatar xbabka01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yaramod's Issues

Build fails when using Bison 3.2

Attempted to install via pip on macos 10.14 using python 3.7 and pip3 version 18.0

Got the following error:

    /private/var/folders/_9/fz23mhs14q135p8kh6f4dg980000gn/T/pip-install-g5ph25wz/yaramod/build/src/yaramod/yy/yy_parser.hpp:738:102: error: too many arguments provided to function-like macro invocation
          basic_symbol (typename Base::kind_type t, YY_RVREF (std::pair<nonstd::optional<std::uint64_t>, nonstd::optional<std::uint64_t>>) v, YY_RVREF (location_type) l);

Allow parsing of rules that are both private and global

First of all, let me say that I think this project is awesome and thanks for maintaining it!

Now the issue. Yara documentation is quite clear on the point that private global rules are allowed:

You can apply both private and global modifiers to a rule, resulting a global rule that does not get reported by YARA but must be satisfied.

Indeed it works with yara:

[nix-shell:/tmp/test]$ cat test.yar
private global rule Kot
{
    strings:
        $a = "dummy1"

    condition:
        $a
}


rule Hmm
{
    condition:
        Kot
}


[nix-shell:/tmp/test]$ yara -rs test.yar .
Hmm ./test.yar
Hmm ./dummy

But this is not currently properly handled by yaramod:

import yaramod

test = """
private global rule Kot
{
    strings:
        $a = "dummy1"

    condition:
        $a
}
"""

print(yaramod.Yaramod().parse_string(test).text_formatted)
[nix-shell:/tmp/test]$ python test.py
Traceback (most recent call last):
  File "test.py", line 14, in <module>
    print(yaramod.Yaramod().parse_string(test).text_formatted)
yaramod.ParserError: Error at 2.9-14: Syntax error: Unexpected global, expected one of rule

I couldn't find any issues similar to this one, so I've created a new one.

Please see my linked PR for proposed fix.

Possible g++ 7.3.1 Internal Compiler error

While compiling 857db32 with g++ 7.3.1, compilation stopped in error:
yaramod.git/deps/variant/variant.hpp:1900:9: internal compiler error: unexpected expression ‘I’ of kind template_parm_index typename T = lib::type_pack_element_t<I, Ts...>, Please submit a full bug report,
Trying CLang version 5.0.2, yaramod compiles fine. Full log about these build steps attached. I can report this to gcc upstream but I need some time to isolate specific template pattern to trigger this error.
g++-7.3.1-internal-compiler-error.log

Adding a new meta can add it to wrong place in the TokenStream

Consider following rule:

rule rule_with_metas {
    meta:
        int_meta = 42 // comment
    condition:
        true
}

After a new meta is added with

rule.add_meta('new_meta', yaramod.Literal(True))

it ends up before the comment and not right after it:

rule rule_with_metas {
    meta:
        int_meta = 42
        new_meta = true // comment
    condition:
        true
}

Allow module functions and attributes to have descriptions

For linting purposes, it would be useful if yaramod provided an option to assign description to each module function, their arguments etc. That way you can lookup what is the function supposed to do and offer it to linter so that users editor display it.

Add support for dynamic modules

Dynamic modules would allow us to define module which would be created from some JSON/other type of description that could be supplied from outside source. It would allow anyone to introduce their own modules without forking the repository or modifying the code.

Add new import feature for deprecated functions

There are certain functions which were (or will be) deprecated in the YARA. We don't want them to be recognized by the default Yaramod but since we have import features, we can solve it using those.

I would propose adding new import feature Deprecated which would also add function that were deprecated (currently only cuckoo.signature.name). If this flag is not specified that deprecated functions won't be included in symbols.

Replacing flex + bison with pure C++ parser generator

There are lot of problems with flex + bison, users on Windows and Macs have troubles with locating it through CMake. On top of that, if you want flex + bison for Windows, we rely on one guy at sourceforge who do these builds. In the future, we might want to get rid of flex + bison at all and use some C++ parser generator. Some possible alternatives which I will edit as I run into them:

On system without cmake, pip install silently fails

When installing with pip on system without cmake (for example, in a docker container) pip install finishes without errors:

$ pip install yaramod
Collecting yaramod
  Downloading https://files.pythonhosted.org/packages/30/ab/015d137c65bd189d7a29b55f53d9f6ec5c98a890d14dc052b885e9175449/yaramod-3.3.3.tar.gz (605kB)
    100% |████████████████████████████████| 614kB 1.7MB/s
Building wheels for collected packages: yaramod
  Failed building wheel for yaramod
  Running setup.py clean for yaramod
Failed to build yaramod
Installing collected packages: yaramod
  Running setup.py install for yaramod ...
-
done
Successfully installed yaramod-3.3.3
root@3132c677476f:~# pip install yaramod
Requirement already satisfied: yaramod in /usr/local/lib/python3.7/site-packages (3.3.3)

But it can't be imported:

root@3132c677476f:~# python
>>> import yaramod
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'yaramod'

I would expect pip install to fail when the required modules can't be built

Build fails at cmake stage in Debian

I tried to build on clean Debian system (in Docker) following steps in readme file.
I get error when trying to run cmake:

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
/retdec/deps/fnc-patterns/deps/yaramod/src/FLEX_INCLUDE_DIR
   used as include directory in directory /retdec/deps/fnc-patterns/deps/yaramod/src
-- Configuring incomplete, errors occurred!

Start using Travis CI also for Windows

Travis should now support Windows CI environment but it's still in early access. There are multiple issues currently, some small, some bigger:

  • The biggest issue right now is that secrets cannot be used because they just randomly bring down VM. This is discussed here.
  • PyPI provider is not working on Windows. I have reported this issue however this is not blocker because it can be easily done using script provider and custom script which just sets up .pypirc, installs twine, runs sdist/bdist_wheel and twine upload.

The PoC is already available in my fork.

Add support for YARA 3.11.0

There are some new things in YARA 3.11.0 which we'll need to implement when moving to YARA 3.11.0

  • XOR modifier can be used to specify XOR range (like xor(1-255)). We'll need to store modifiers in something else than just enum because there are now parameters. Variant possibly (to mimic Rust-like enums)
  • dotnet module contains field_offsets and number_of_field_offsets
  • String modifier private
  • xor modifier should not be usable with regular expressions
  • crc32 in hash module
  • Update YARA_SYNTAX_VERSION to 3.11

Whoever will be working on this ticket, please go through the changes between 3.10 and 3.11 once again because I might have missed something.

Improve links between individual constructs in internal representation

Right now, the internal representation many times forgets link between individual construct in individual representation. One such example is this one

rule abc
{
        condition:
                true
}

rule def
{
        condition:
                abc
}

In this case, even though we check that abc must exist in symbol table while parsing condition of def, we forget this link and there is no way to get directly from abc in def condition to Rule instance of abc rule without looking up all rules in YARA file.

This ticket aims for improving the internal representation in such a way, that we'll keep these links and so it will be much easier to reach other parts of YARA file without doing whole file lookups.

Use-case:

I would like to rename rule and all links to it. So then if I do

abc_rule.name = 'XYZ'

then I would expect output to be

rule XYZ
{
        condition:
                true
}

rule def
{
        condition:
                XYZ
}

and not

rule XYZ
{
        condition:
                true
}

rule def
{
        condition:
                abc
}

These changes relate mostly to link between rules but I would also like to have something similar for strings.

Technical details:

It can be easier for rules since it can be solved just by some pointer to symbol table and renaming a rule would rename the record in symbol table (thus effectively renaming it everywhere else with zero cost) but it can be harder for strings since they are stored in trie-like structure for better lookup with string prefix. But that can possibly be solved by storing a pointer directly to the string itself which shouldn't change even if we move the string in trie to somewhere else.

"error: ‘realpath’ was not declared in this scope" on cygwin64

[ 83%] Building CXX object src/CMakeFiles/yaramod.dir/utils/filesystem.cpp.o
/cygdrive/c/Users/Creation/Downloads/git/retdec/external/src/yaramod-project/src/utils/filesystem.cpp: In function ‘std::string yaramod::detail::absolutePath(const string&)’:
/cygdrive/c/Users/Creation/Downloads/git/retdec/external/src/yaramod-project/src/utils/filesystem.cpp:86:6: error: ‘realpath’ was not declared in this scope
  if (realpath(path.c_str(), absolutePathStr) == nullptr)
      ^~~~~~~~
/cygdrive/c/Users/Creation/Downloads/git/retdec/external/src/yaramod-project/src/utils/filesystem.cpp:86:6: note: suggested alternative: ‘path’
  if (realpath(path.c_str(), absolutePathStr) == nullptr)
      ^~~~~~~~
      path
make[2]: *** [src/CMakeFiles/yaramod.dir/build.make:373: src/CMakeFiles/yaramod.dir/utils/filesystem.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:109: src/CMakeFiles/yaramod.dir/all] Error 2
make: *** [Makefile:130: all] Error 2

Modifying rule with ModifyingVisitor should also modify it when obtaining getTextFormatted()

This ticket is kinda related to #73 but it should also have its own because in some specific things it is a bit different.

Basically, we provide an option to modify the condition through ModifyingVisitor but these changes are not reflect when getting back the text_formatted out of YaraFile. This is caused by not directly modifying token streams when we are modifying the condition. For example following code:

import json
import yaramod

class RegexpCaseInsesitiveAdder(yaramod.ModifyingVisitor):
    def add(self, yara_file: yaramod.YaraFile):
        for rule in yara_file.rules:
            new_condition = self.modify(rule.condition)
            print(new_condition.text)
            rule.condition = new_condition

    def visit_RegexpExpression(self, expr: yaramod.Expression):
        return yaramod.regexp('abc', 'i').get()

ymod = yaramod.Yaramod()
yfile = ymod.parse_file('/tmp/ruleset.yar')

regexp_icase_adder = RegexpCaseInsesitiveAdder()
regexp_icase_adder.add(yfile)

print(yfile.text_formatted)

This just replaces all regexes in conditions with /abc/i but when obtained through text_formatted it still return the original condition while it should return the modified one. It is expected that it returns the modified one.

Add support of multiple metadata identifiers of the same name

Hi,

Please, add support of multiple metadata identifiers of the same name into yaramod, e.g.

rule rule_with_multiple_hash_meta_tags
{
	meta:
		hash = "0000000000000000000000000000000000000000000000000000000000000000"
		hash = "1111111111111111111111111111111111111111111111111111111111111111"
		hash = "2222222222222222222222222222222222222222222222222222222222222222"
	strings:
		$dummy = "dummy"
	condition:
		any of them
}

All these three hash identifiers should be accessible - it is valid in Yara.

Thanks,
Jakub

Fix warnings about deprecated directives when using bison 3.3

Bison 3.3 deprecated some of the directives we use

../src/parser/yy/parser.y:59.1-14: warning: deprecated directive, use ‘%define parse.error verbose’ [-Wdeprecated]                                                                                                                                                                        
 %error-verbose                                                                                                                                                                                                                                                                           
 ^~~~~~~~~~~~~~                                                                                                                                                                                                                                                                           
../src/parser/yy/parser.y:62.1-36: warning: deprecated directive, use ‘%define api.parser.class { Parser }’ [-Wdeprecated]                                                                                                                                                                
 %define parser_class_name { Parser }                                                                                                                                                                                                                                                     
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                                                                                                                                                                                                     

The problem is that for example my Linux distribution still uses bison 3.0.4 and the new directives are not yet implemented in this version so we will probably need to define them conditionally or just leave the warnings as they are and hope that they won't turn into errors until we get rid of flex and bison.

Provide regexp units through Python bindings

Regular expressions are parsed down to individual regular expression units (character class, alternation, meta characters like ., \s, ...). These unit are available through C++ API but they are not available through Python bindings. It would be nice to have them available in case anybody wants to process them from Python.

It should be enough to add all of them into src/python/yaramod_python.cpp with pointers to their respective C++ counterparts.

Some Rule member functions do not operate corresponding TokenStream well.

Some of the less frequently used Rule methods do not handle the TokenStream well and the formatted text can be left unchanged. This covers methods Rule::removeString, Rule::setMetas and Rule::setCondition. I recommend to implement the setMetas method using Rule::removeMetas and Rule::addMeta methods that have been already fixed in #83.

Unify types in Literal

Literal is now implemented as variant that is able to store multiple types which is OK but there are some duplicate types which are not necessary. They are just creating user-unfriendly interface. Those are:

  • int
  • int64_t
  • uint64_t

Meta values are stored as integers. If you have integer meta and you ask for int then it could possibly fail because type is std::int64_t (example). We should unify these types so that all integers are using the same representation in literals which would possibly be int64_t to also store signed integers.

Placement of and/or is wrong if ands/ors are misplaced

Current autoformatter expects that and and ors are placed in a certain way in order to autoformat newlines around them. This is expected format

A and
B and
C and
...

but that only works if you write A and B and C or you specify it in that expected format. If you however write this

A
and B
and C

you'll end up with

A
and
B
and
C

What is not expected. Please take a look at why it happens and try to come up with a solution that would generally handle any kind of whitespace shenanigans around ands and ors. Thank you.

Parsing of comments

We currently ignore all comments when parsing. If we ever wanted to use yaramod for formatting of YARA rules, we would like to keep comments, not strip them.

Handling of square brackets inside classes in regexes is broken

When you have regexp like /[[]*+]/ it handles it somehow weirdly and you end up with an error

yaramod.ParserError: Error at 8.28: Syntax error: Unexpected regexp +, expected one of (, ), /, regexp |, regexp ?, regexp ^, regexp $, regexp ., regexp ., regexp \w, regexp \W, regexp \s, regexp \S, regexp \d, regexp \D, regexp \b, regex
p \B, regexp class

If you remove the first ] from the regexp (so /[[*+]/``) it works. The square brackets inside classes are handled somehow weirdly. Even if you escape them ]`. We don't have to build such a resistant parser that it will handle them when they are not escaped but I would expect that it doesn't fail when they are escaped.

Multi-line Hex strings parsing error

The following yara rule cannot be parsed :

rule example {

        strings:
                $string1 = { FF FF FF FF FF FF FF FF FF FF FF FF
                             FF FF FF FF FF FF FF FF FF FF FF FF }

        condition:
                all of them
}

It will cause the following error :

Error at 4.65: syntax error, unexpected END, expecting LP or LSQB or HEX_WILDCARD or HEX_NIBBLE

Allow obtaining global symbols (like modules, their functions, attributes) even without created YaraFile

The motivation here is to be able to obtain list of all functions in certain module without creating YaraFile object. We should be able to do something like yaramod::getSymbol("cuckoo") and it would search only for global symbols. Alternatively, you would also be able to provide Import Features in there if needed. This would be useful for listing those symbols (for example for code suggestion purposes) and not having to create dummy YaraFile object with all imports.

Add option to specify which style to use for curly braces placement

Currently, generated curly braces out of yaramod are always K&R or Egyptian braces, like this

rule abc {
    ...
}

It would be nice to be able to specify whether to use the already existing option or

rule abc
{
    ...
}

Once we implement #17, it will make sense to use this just for those rules which are generated from builder itself, not those which were parsed, but we might possibly override the behavior with this option.

Different info in license summary list and full license text list in LICENSE-THIRD-PARTY

In file LICENSE-THIRD-PARTY there is this list of licenses at the start:

yaramod uses the following third-party libraries or other resources:
1) Google Test: https://github.com/google/googletest
2) tl-cpputils: https://github.com/avast-tl/tl-cpputils
3) optional_lite: https://github.com/martinmoene/optional-lite

And these full license texts:

1) Google Test
2) optional_lite
3) pybind11
4) variant

I don't know what licenses are actually used. Please verify it and make sure both of these lists are correct and the same.

Create Python bindings for creating Literal type

There is currently no way to create Literal type in Python bindings so for example Rule.add_meta cannot be called because the second parameter is expected to be Literal but it cannot be created. We should provide bindings for all the different constructors we have in C++ part.

Builder does not allow to create string meta with empty string

YaraRuleBuilder does not allow to create string meta with empty string as value but it should allow it because even empty value is valid value. Exception thrown:

C++ exception with description "YaraRuleBuilder error: Error: String-Meta key and value must be non-empty." thrown in the test body.

Add install target to CMake

Currently, there is no install target in CMake so in case you use yaramod as dependency in your project, you need to manually setup paths to include directories etc.

Preserve newlines in the condition when parsing it

We currently loose all information about new lines when parsing rule condition so when you print back the parsed rule, the condition is all just one single line. If we want to do automatic formatting of rules, new lines need to be preserved to keep proper indentation.

Build Windows Python wheels with static runtime

Currently we build yaramod Python wheel for Windows with dynamic runtime, which requires users to have installed proper version of VC++ 2019 runtime. We should try to find a way to build it with static runtime as it would be more comfortable to some users.

I don't know whether it will be possible, depends also on Python DLL and what pybind offers but it is still worth to look into it.

Handle rule modifiers the same way as string modifiers

In d59cc83 there was added support for multiple rule modifiers at the same time. It is good that we are handling it now but I think we should approach it the same way as string modifiers (https://github.com/avast/yaramod/blob/master/include/yaramod/types/string_modifier.h#L18)

The reason for this is that whenever new rule modifier is added then we'll pay the price. As of right now, we store those modifiers as just enum and since there are only 2 modifiers, that's 4 possible variants of modifiers combined together. However add one more and it is already 8. One more and it is 16.

It is not critical to handle this right now, but we should be prepared because as I said, we are going to pay the price eventually.

Investigate how to deal with long AndExpression and OrExpression chains in AST

We have problems when dealing with really really long AndExpression and OrExpression chains that they often run out of available stack when doing Visitor-like traversal. We should somehow deal with this situation so that we don't prevent analysis of any rulesets.

There are 2 possible things that we should investigate, which one is more appropriate (or suggest something else, I am open to new ideas):

  1. Come with idea how to do AST traversal non-recursively without alerting the current interface too much. This would require complete overhaul of how we approach it right now. It would be easier for ObservingVisitor since it doesn't require immediate results of visits from child ones but it will be much harder for ModifyingVisitor.

  2. Store these chains in a different way in AST. Basically we would not use binary nodes but N-ary nodes where it would make sense. This would however not prevent every single problem, like someone doing A and B or C and D or E ... just to throw us off. This could be possible with doing single LogicalExpression for both of them and then store not only child nodes, but relation of two consecutive nodes, but the question is whether this all is really worth it.

Add CMake option to build yaramod with Windows static runtime

PR #82 partially solved this - Yaramod users can pass CXX flags that contain /MT instead of /MD and Yaramod (and its deps) will be properly build using MSVC static runtime.
But, it would be much nicer if there was an option that would take care of this. Something like YARAMOD_BUILD_STATIC_RUNTIME (e.g. Capstone/Keystone) or YARAMOD_MSVC_STATIC_RUNTIME (e.g. RetDec).

Allow parsing of incomplete rules

It would be nice to parse rules also in some kind of IncompleteRules mode which would ignore that some symbols might be missing from symbol tables. This would allow us to for example autoformat rules even without any includes which refer to other rules that are not part of the autoformatted rule.

For example let's say I have

rule abc {
    condition:
        xyz
}

xyz is not defined here but it might come from some include later therefore we'll just parse it silently and ignore it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.