The block size for stream
is probably 64KB or so, so a regex match on the order of 16 bytes should always work, since it's a lot smaller than the block size.
However, whether it works depends greatly on the location in the file. Test case:
const find = require("find-in")['default'];
const fs = require("fs");
let flen = 64 * 1024 * 3;
let rlen = 10;
let test = (i) => {
let buf = Buffer.alloc(flen, ' ', 'utf8');
buf.fill('B', i - 1, i + rlen + 1, 'utf8');
buf.fill('A', i, i + rlen, 'utf8');
fs.writeFileSync("test.txt", buf);
find("test.txt", [/BA*B/], (err, report) => {
let found = false;
for (let r = 0; r < report.length; r++) {
if (report[r].isFound) {
found = true;
}
}
if (!found) console.log(`${i}: missing`);
if (i < flen - rlen - 10) {
test(i + 1);
}
});
};
test(10);
In other words, for offsets in the file starting at 10 bytes and ending near 192KB, fill the file with
, then insert BAAAAAAAAAAB
at the offset. Then search for it and report if it wasn't found.
That will churn for a while, then print:
65526: missing
65527: missing
65528: missing
65529: missing
65530: missing
65531: missing
65532: missing
65533: missing
65534: missing
65535: missing
65536: missing
I killed it at this point.
When the regex match overlaps the block boundary, it's not matched.
You can solve this by gluing subsequent pairs of blocks together. You'll still be able to match everything up to 64KB long. You'll be able to do this with a fixed amount of memory.