Coder Social home page Coder Social logo

Comments (12)

hillu avatar hillu commented on May 26, 2024 2

I have added a few endpoints to your sample in order to trigger GC and to log malloc statistics using the malloc_stats(3) function. Here's what I have found:
This is the initial state (before any rules have been parsed):

Arena 0:
system bytes     =     135168
in use bytes     =       4752
Arena 1:
system bytes     =     135168
in use bytes     =       2896
Arena 2:
system bytes     =     135168
in use bytes     =       2896
Arena 3:
system bytes     =     135168
in use bytes     =       2896
Arena 4:
system bytes     =     135168
in use bytes     =       2896
Total (incl. mmap):
system bytes     =     675840
in use bytes     =      16336
max mmap regions =          0
max mmap bytes   =          0

It turns out that having YARA parse a ruleset even just once will considerably grow Arena 4:

Arena 4:
system bytes     =   16281600
in use bytes     =   15946016

Forcing GC will cause the "in use bytes" value to drop, but the "system bytes" value will stay at its previous level.

Arena 5:
system bytes     =   16281600
in use bytes     =      48112

Note that for some reason, Arena 4 is now called Arena 5, but the "system bytes" value stays the same. (I don't know enough about the GNU libc heap implementation to explain this.)

Parsing the ruleset multiple times without forcing a GC will cause the rulesets to be allocated to multiple arenas.

Also note that 45216 bytes have apparently not been freed.

malloc(3) keeps buffers around for subsequent reuse, that's a reason why the "system bytes" values are not dropped. What looks like a HUGE memory leak may still be a leak in the code but not as large as one would think. Most of the leak has something to do with GNU libc malloc(3) implementation and probably fragmentation.

The Dgraph developers seem to have run into similar effects and their solution is to use jemalloc. One can override the default malloc(3) implementation with jemalloc using LD_PRELOAD on Linux:

: ; LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 ./main

Of course this is not a proper solution but it may help you in the short term. I'll try to figure out how to fix this with the standard malloc(3).

from go-yara.

hillu avatar hillu commented on May 26, 2024

My best guess is that the Go garbage collector does not collect the Rules object (which would cause yr_rules_destroy to be called), but without a reproducer, it is hard to tell.

Is the size of the memory leak similar to the size of the compiled ruleset?

from go-yara.

hillu avatar hillu commented on May 26, 2024

Can you share some code sample that demonstrates the behavior?

from go-yara.

 avatar commented on May 26, 2024

Can you share some code sample that demonstrates the behavior?

sample: https://github.com/qtmee/yara-memory-leak.git

Browser enter http://localhost:3000/i to simulate the recreate command
image

from go-yara.

hillu avatar hillu commented on May 26, 2024

Does adding runtime.GC() after replacing the Rules instance fix the problem for you?

How about explicitly calling s.Rules.Destroy() right before replacing it?

from go-yara.

 avatar commented on May 26, 2024

Does adding runtime.GC() after replacing the Rules instance fix the problem for you?

How about explicitly calling s.Rules.Destroy() right before replacing it?

Both approaches have been tried, and memory has not dropped

from go-yara.

 avatar commented on May 26, 2024

I have added a few endpoints to your sample in order to trigger GC and to log malloc statistics using the malloc_stats(3) function. Here's what I have found:
This is the initial state (before any rules have been parsed):

Arena 0:
system bytes     =     135168
in use bytes     =       4752
Arena 1:
system bytes     =     135168
in use bytes     =       2896
Arena 2:
system bytes     =     135168
in use bytes     =       2896
Arena 3:
system bytes     =     135168
in use bytes     =       2896
Arena 4:
system bytes     =     135168
in use bytes     =       2896
Total (incl. mmap):
system bytes     =     675840
in use bytes     =      16336
max mmap regions =          0
max mmap bytes   =          0

It turns out that having YARA parse a ruleset even just once will considerably grow Arena 4:

Arena 4:
system bytes     =   16281600
in use bytes     =   15946016

Forcing GC will cause the "in use bytes" value to drop, but the "system bytes" value will stay at its previous level.

Arena 5:
system bytes     =   16281600
in use bytes     =      48112

Note that for some reason, Arena 4 is now called Arena 5, but the "system bytes" value stays the same. (I don't know enough about the GNU libc heap implementation to explain this.)

Parsing the ruleset multiple times without forcing a GC will cause the rulesets to be allocated to multiple arenas.

Also note that 45216 bytes have apparently not been freed.

malloc(3) keeps buffers around for subsequent reuse, that's a reason why the "system bytes" values are not dropped. What looks like a HUGE memory leak may still be a leak in the code but not as large as one would think. Most of the leak has something to do with GNU libc malloc(3) implementation and probably fragmentation.

The Dgraph developers seem to have run into similar effects and their solution is to use jemalloc. One can override the default malloc(3) implementation with jemalloc using LD_PRELOAD on Linux:

: ; LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 ./main

Of course this is not a proper solution but it may help you in the short term. I'll try to figure out how to fix this with the standard malloc(3).

Thank you very much for your answer. I will continue to pay attention to this problem and try the solution you have provided at present

from go-yara.

hillu avatar hillu commented on May 26, 2024

I have taken a closer look at the various malloc tuning parameters documented in the GNU libc manual, but they don't seem to make any difference.

Rather than setting LD_PRELOAD, you can use a compiler build flag, though:

; go build -ldflags="-linkmode=external -extldflags=-ljemalloc"

from go-yara.

 avatar commented on May 26, 2024

Do I modify the Go-Yara source code directly? Change malloc used in CGO to je_calloc? I have tried but failed. Could you help me to do it again? Thank you

from go-yara.

hillu avatar hillu commented on May 26, 2024

No, you don't have to modify anything. Just install libjemalloc-dev from your distribution and use the -ldflags parameter I gave above.

from go-yara.

 avatar commented on May 26, 2024

This problem has been solved。There are no direct loading yara rules,the yara rules are precompiled, and the yara rules that have been precompiled are loaded when reloaded。

from go-yara.

 avatar commented on May 26, 2024

This problem has been solved。There are no direct loading yara rules,the yara rules are precompiled, and the yara rules that have been precompiled are loaded when reloaded。

from go-yara.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.