Go-Yara is used in my program. Now I want to reload a batch of rules according to the

Without restarting the program, repeated loading of some rules keeps increasing memory about go-yara HOT 12 CLOSED

commented on May 26, 2024

Without restarting the program, repeated loading of some rules keeps increasing memory

from go-yara.

Comments (12)

hillu commented on May 26, 2024 2

I have added a few endpoints to your sample in order to trigger GC and to log malloc statistics using the malloc_stats(3) function. Here's what I have found:
This is the initial state (before any rules have been parsed):

Arena 0:
system bytes     =     135168
in use bytes     =       4752
Arena 1:
system bytes     =     135168
in use bytes     =       2896
Arena 2:
system bytes     =     135168
in use bytes     =       2896
Arena 3:
system bytes     =     135168
in use bytes     =       2896
Arena 4:
system bytes     =     135168
in use bytes     =       2896
Total (incl. mmap):
system bytes     =     675840
in use bytes     =      16336
max mmap regions =          0
max mmap bytes   =          0

It turns out that having YARA parse a ruleset even just once will considerably grow Arena 4:

Arena 4:
system bytes     =   16281600
in use bytes     =   15946016

Forcing GC will cause the "in use bytes" value to drop, but the "system bytes" value will stay at its previous level.

Arena 5:
system bytes     =   16281600
in use bytes     =      48112

Note that for some reason, Arena 4 is now called Arena 5, but the "system bytes" value stays the same. (I don't know enough about the GNU libc heap implementation to explain this.)

Parsing the ruleset multiple times without forcing a GC will cause the rulesets to be allocated to multiple arenas.

Also note that 45216 bytes have apparently not been freed.

malloc(3) keeps buffers around for subsequent reuse, that's a reason why the "system bytes" values are not dropped. What looks like a HUGE memory leak may still be a leak in the code but not as large as one would think. Most of the leak has something to do with GNU libc malloc(3) implementation and probably fragmentation.

The Dgraph developers seem to have run into similar effects and their solution is to use jemalloc. One can override the default malloc(3) implementation with jemalloc using LD_PRELOAD on Linux:

: ; LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 ./main

Of course this is not a proper solution but it may help you in the short term. I'll try to figure out how to fix this with the standard malloc(3).

from go-yara.

hillu commented on May 26, 2024

My best guess is that the Go garbage collector does not collect the Rules object (which would cause yr_rules_destroy to be called), but without a reproducer, it is hard to tell.

Is the size of the memory leak similar to the size of the compiled ruleset?

from go-yara.

hillu commented on May 26, 2024

Can you share some code sample that demonstrates the behavior?

from go-yara.

commented on May 26, 2024

Can you share some code sample that demonstrates the behavior?

sample: https://github.com/qtmee/yara-memory-leak.git

Browser enter http://localhost:3000/i to simulate the recreate command

from go-yara.

hillu commented on May 26, 2024

Does adding runtime.GC() after replacing the Rules instance fix the problem for you?

How about explicitly calling s.Rules.Destroy() right before replacing it?

from go-yara.

commented on May 26, 2024

Does adding runtime.GC() after replacing the Rules instance fix the problem for you?

How about explicitly calling s.Rules.Destroy() right before replacing it?

Both approaches have been tried, and memory has not dropped

from go-yara.

commented on May 26, 2024

I have added a few endpoints to your sample in order to trigger GC and to log malloc statistics using the malloc_stats(3) function. Here's what I have found:
This is the initial state (before any rules have been parsed):
Arena 0:
system bytes     =     135168
in use bytes     =       4752
Arena 1:
system bytes     =     135168
in use bytes     =       2896
Arena 2:
system bytes     =     135168
in use bytes     =       2896
Arena 3:
system bytes     =     135168
in use bytes     =       2896
Arena 4:
system bytes     =     135168
in use bytes     =       2896
Total (incl. mmap):
system bytes     =     675840
in use bytes     =      16336
max mmap regions =          0
max mmap bytes   =          0
It turns out that having YARA parse a ruleset even just once will considerably grow Arena 4:
Arena 4:
system bytes     =   16281600
in use bytes     =   15946016
Forcing GC will cause the "in use bytes" value to drop, but the "system bytes" value will stay at its previous level.
Arena 5:
system bytes     =   16281600
in use bytes     =      48112
Note that for some reason, Arena 4 is now called Arena 5, but the "system bytes" value stays the same. (I don't know enough about the GNU libc heap implementation to explain this.)

Parsing the ruleset multiple times without forcing a GC will cause the rulesets to be allocated to multiple arenas.

Also note that 45216 bytes have apparently not been freed.

malloc(3) keeps buffers around for subsequent reuse, that's a reason why the "system bytes" values are not dropped. What looks like a HUGE memory leak may still be a leak in the code but not as large as one would think. Most of the leak has something to do with GNU libc malloc(3) implementation and probably fragmentation.

The Dgraph developers seem to have run into similar effects and their solution is to use jemalloc. One can override the default malloc(3) implementation with jemalloc using LD_PRELOAD on Linux:
: ; LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 ./main
Of course this is not a proper solution but it may help you in the short term. I'll try to figure out how to fix this with the standard malloc(3).

Thank you very much for your answer. I will continue to pay attention to this problem and try the solution you have provided at present

from go-yara.

hillu commented on May 26, 2024

I have taken a closer look at the various malloc tuning parameters documented in the GNU libc manual, but they don't seem to make any difference.

Rather than setting LD_PRELOAD, you can use a compiler build flag, though:

; go build -ldflags="-linkmode=external -extldflags=-ljemalloc"

from go-yara.

commented on May 26, 2024

Do I modify the Go-Yara source code directly? Change malloc used in CGO to je_calloc? I have tried but failed. Could you help me to do it again? Thank you

from go-yara.

hillu commented on May 26, 2024

No, you don't have to modify anything. Just install libjemalloc-dev from your distribution and use the -ldflags parameter I gave above.

from go-yara.

commented on May 26, 2024

This problem has been solved。There are no direct loading yara rules，the yara rules are precompiled, and the yara rules that have been precompiled are loaded when reloaded。

from go-yara.

commented on May 26, 2024

This problem has been solved。There are no direct loading yara rules，the yara rules are precompiled, and the yara rules that have been precompiled are loaded when reloaded。

from go-yara.

Without restarting the program, repeated loading of some rules keeps increasing memory about go-yara HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent