Comments (12)
I have added a few endpoints to your sample in order to trigger GC and to log malloc statistics using the malloc_stats(3) function. Here's what I have found:
This is the initial state (before any rules have been parsed):
Arena 0:
system bytes = 135168
in use bytes = 4752
Arena 1:
system bytes = 135168
in use bytes = 2896
Arena 2:
system bytes = 135168
in use bytes = 2896
Arena 3:
system bytes = 135168
in use bytes = 2896
Arena 4:
system bytes = 135168
in use bytes = 2896
Total (incl. mmap):
system bytes = 675840
in use bytes = 16336
max mmap regions = 0
max mmap bytes = 0
It turns out that having YARA parse a ruleset even just once will considerably grow Arena 4:
Arena 4:
system bytes = 16281600
in use bytes = 15946016
Forcing GC will cause the "in use bytes" value to drop, but the "system bytes" value will stay at its previous level.
Arena 5:
system bytes = 16281600
in use bytes = 48112
Note that for some reason, Arena 4 is now called Arena 5, but the "system bytes" value stays the same. (I don't know enough about the GNU libc heap implementation to explain this.)
Parsing the ruleset multiple times without forcing a GC will cause the rulesets to be allocated to multiple arenas.
Also note that 45216 bytes have apparently not been freed.
malloc(3) keeps buffers around for subsequent reuse, that's a reason why the "system bytes" values are not dropped. What looks like a HUGE memory leak may still be a leak in the code but not as large as one would think. Most of the leak has something to do with GNU libc malloc(3) implementation and probably fragmentation.
The Dgraph developers seem to have run into similar effects and their solution is to use jemalloc. One can override the default malloc(3) implementation with jemalloc using LD_PRELOAD
on Linux:
: ; LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 ./main
Of course this is not a proper solution but it may help you in the short term. I'll try to figure out how to fix this with the standard malloc(3).
from go-yara.
My best guess is that the Go garbage collector does not collect the Rules object (which would cause yr_rules_destroy
to be called), but without a reproducer, it is hard to tell.
Is the size of the memory leak similar to the size of the compiled ruleset?
from go-yara.
Can you share some code sample that demonstrates the behavior?
from go-yara.
Can you share some code sample that demonstrates the behavior?
sample: https://github.com/qtmee/yara-memory-leak.git
Browser enter http://localhost:3000/i to simulate the recreate command
from go-yara.
Does adding runtime.GC()
after replacing the Rules instance fix the problem for you?
How about explicitly calling s.Rules.Destroy()
right before replacing it?
from go-yara.
Does adding
runtime.GC()
after replacing the Rules instance fix the problem for you?How about explicitly calling
s.Rules.Destroy()
right before replacing it?
Both approaches have been tried, and memory has not dropped
from go-yara.
I have added a few endpoints to your sample in order to trigger GC and to log malloc statistics using the malloc_stats(3) function. Here's what I have found:
This is the initial state (before any rules have been parsed):Arena 0: system bytes = 135168 in use bytes = 4752 Arena 1: system bytes = 135168 in use bytes = 2896 Arena 2: system bytes = 135168 in use bytes = 2896 Arena 3: system bytes = 135168 in use bytes = 2896 Arena 4: system bytes = 135168 in use bytes = 2896 Total (incl. mmap): system bytes = 675840 in use bytes = 16336 max mmap regions = 0 max mmap bytes = 0
It turns out that having YARA parse a ruleset even just once will considerably grow Arena 4:
Arena 4: system bytes = 16281600 in use bytes = 15946016
Forcing GC will cause the "in use bytes" value to drop, but the "system bytes" value will stay at its previous level.
Arena 5: system bytes = 16281600 in use bytes = 48112
Note that for some reason, Arena 4 is now called Arena 5, but the "system bytes" value stays the same. (I don't know enough about the GNU libc heap implementation to explain this.)
Parsing the ruleset multiple times without forcing a GC will cause the rulesets to be allocated to multiple arenas.
Also note that 45216 bytes have apparently not been freed.
malloc(3) keeps buffers around for subsequent reuse, that's a reason why the "system bytes" values are not dropped. What looks like a HUGE memory leak may still be a leak in the code but not as large as one would think. Most of the leak has something to do with GNU libc malloc(3) implementation and probably fragmentation.
The Dgraph developers seem to have run into similar effects and their solution is to use jemalloc. One can override the default malloc(3) implementation with jemalloc using
LD_PRELOAD
on Linux:: ; LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 ./main
Of course this is not a proper solution but it may help you in the short term. I'll try to figure out how to fix this with the standard malloc(3).
Thank you very much for your answer. I will continue to pay attention to this problem and try the solution you have provided at present
from go-yara.
I have taken a closer look at the various malloc tuning parameters documented in the GNU libc manual, but they don't seem to make any difference.
Rather than setting LD_PRELOAD, you can use a compiler build flag, though:
; go build -ldflags="-linkmode=external -extldflags=-ljemalloc"
from go-yara.
Do I modify the Go-Yara source code directly? Change malloc used in CGO to je_calloc? I have tried but failed. Could you help me to do it again? Thank you
from go-yara.
No, you don't have to modify anything. Just install libjemalloc-dev from your distribution and use the -ldflags
parameter I gave above.
from go-yara.
This problem has been solved。There are no direct loading yara rules,the yara rules are precompiled, and the yara rules that have been precompiled are loaded when reloaded。
from go-yara.
This problem has been solved。There are no direct loading yara rules,the yara rules are precompiled, and the yara rules that have been precompiled are loaded when reloaded。
from go-yara.
Related Issues (20)
- There's no way to return an error from MemoryBlockIterator HOT 5
- yr_scanner_scan_file Using mmap is a dangerous operation HOT 9
- Unable to cross compile yara for windows on ubuntu HOT 5
- Is the new tag version expected ? HOT 1
- Issues while installing HOT 7
- Unable to use ScanProc HOT 23
- Unable to define variable on AIX HOT 4
- Scan a file in a streaming maner HOT 2
- Encrypted rules? HOT 4
- unstable rules HOT 5
- Wrong release version? HOT 1
- Building static binary HOT 3
- cannot find -lyara HOT 1
- linux编译yara出现错误 HOT 2
- generate a dynamic-link library (.so file) HOT 1
- v3.x no release tag HOT 1
- Failing to compile with go-yara HOT 9
- yararule.ScanFile, can not scan filepath which contains chinese, may be other language has the same HOT 2
- Attempt to add a new YARA rule files during runtime causes panic HOT 1
- Question: Adding multiple Compiled YARA files to a single yara.Scanner or *yara.Rules HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from go-yara.