Coder Social home page Coder Social logo

jalal246 / find-in Goto Github PK

View Code? Open in Web Editor NEW
11.0 4.0 1.0 383 KB

Yet another tool, written in JS for Searching Text in Files!

Home Page: https://jalal246.github.io/find-in/

License: MIT License

JavaScript 100.00%
fs filesystem finder stream chunk regex callback promise async search

find-in's Introduction

Ceasefire Now

Hello there ๐Ÿ‘‹๐Ÿ‘‹๐Ÿ‘‹

I am a Software Engineer focused on the frontend. I like risky and ambitious ideas. Very interested in making the impossible possible. Making a difference matter. I invested too much in JavaScript and I am willing to continue to do so. I love, like really really love, open-source projects. You can find the projects I've made below.

You have an intreating idea, reach me out and let's have a chat ๐ŸŒ

find-in's People

Contributors

dependabot[bot] avatar jalal246 avatar renovate-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

jkso

find-in's Issues

Failure to match regexes on chunk boundaries

The block size for stream is probably 64KB or so, so a regex match on the order of 16 bytes should always work, since it's a lot smaller than the block size.

However, whether it works depends greatly on the location in the file. Test case:

const find = require("find-in")['default'];
const fs = require("fs");

let flen = 64 * 1024 * 3;
let rlen = 10;

let test = (i) => {
    let buf = Buffer.alloc(flen, ' ', 'utf8');
    buf.fill('B', i - 1, i + rlen + 1, 'utf8');
    buf.fill('A', i, i + rlen, 'utf8');
    fs.writeFileSync("test.txt", buf);
    find("test.txt", [/BA*B/], (err, report) => {
        let found = false;
        for (let r = 0; r < report.length; r++) {
            if (report[r].isFound) {
                found = true;
            }
        }
        if (!found) console.log(`${i}: missing`);
        if (i < flen - rlen - 10) {
            test(i + 1);
        }
    });
};

test(10);

In other words, for offsets in the file starting at 10 bytes and ending near 192KB, fill the file with , then insert BAAAAAAAAAAB at the offset. Then search for it and report if it wasn't found.

That will churn for a while, then print:

65526: missing
65527: missing
65528: missing
65529: missing
65530: missing
65531: missing
65532: missing
65533: missing
65534: missing
65535: missing
65536: missing

I killed it at this point.

When the regex match overlaps the block boundary, it's not matched.

You can solve this by gluing subsequent pairs of blocks together. You'll still be able to match everything up to 64KB long. You'll be able to do this with a fixed amount of memory.

Failure to match long regexes

Test case:

const find = require("find-in")['default'];
const fs = require("fs");

let flen = 64 * 1024 * 3;
let rlen = 64 * 1024;

let test = (i) => {
    let buf = Buffer.alloc(flen, ' ', 'utf8');
    buf.fill('B', i - 1, i + rlen + 1, 'utf8');
    buf.fill('A', i, i + rlen, 'utf8');
    fs.writeFileSync("test.txt", buf);
    find("test.txt", [/BA*B/], (err, report) => {
        let found = false;
        for (let r = 0; r < report.length; r++) {
            if (report[r].isFound) {
                found = true;
            }
        }
        if (!found) console.log(`${i}: missing`);
    });
};

test(10);

This probably can't be fixed to any reasonable degree. However, you could use a fixed chunk size or let people pass it in, advertising explicitly that you won't be able to handle anything longer than that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.