Coder Social home page Coder Social logo

howdesbt's People

Contributors

eseiler avatar pashadag avatar rsharris avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

howdesbt's Issues

Hash collision rate may be higher than it ought to be

(All of the following is hypothetical work, but there is currently no plan to do this.)

Canonicalization of the hash function by using h(forward)+h(revcomp) may be introducing more collisions. The solution would be to use something like min(h(forward),(revcomp)), though this can skew the distribution of bits.

More testing would be needed to determine whether the current canonicalization is really a problem.

If it is a problem, any solution needs to retain backward compatibility so as to insure compatibility with existing files. The BF file format includes a version number -- newer versions of the program would recognize the earlier file versions and use the current hash function with those. Newly created BF files would use the new hash function. Older versions of the program would reject the new files.

bfdistance: one filter vs a set of filters

This is actually not an issue, it is more like a request to implement a new feature.

The bfdistance command automatically compute the pair-wise distance between all the BF files listed in --list=<filename>. This is great but it is not currently possible to make it parallel and speed up the whole process, which would be extremely helpful when dealing with hundred thousands of BFs. This is probably not straightforward but can be indirectly solved by thinking of another problem.

In particular, let say that I want to compute the distance between one BF only and a set of BFs. This is not currently possible, but it seems super easy to implement since bfdistance already takes one BF file <filename> and a list of BF files --list=<filename> in input. The idea is that it could compute the distance between <filename> and --list=<filename> whether both of them are specified in input (currently, it seems ignoring <filename> if --list=<filename> is also specified, but I may be wrong).

At this point, there would be no reason to think about a way to make it parallel since we could simply run multiple instances of howdesbt bfdistance with different <filename> but all with the same --list=<filename>.

bfoperate: bloom filters with more than one bit vectors

I noticed that you check now whether the input bloom filters contain only one bit vector each before applying a specific logic operator with the bfoperate subcommand.

This totally makes sense, but I didn't get why the condition on the number of bit vectors should be > 2 in case of the second bloom filter passed in input (bfB). Should it probably be fixed with > 1 as well as you already do with the first bloom filter bfA?

void BFOperateCommand::op_and()

HowDeSBT/cmd_bf_operate.cc

Lines 243 to 246 in aaab732

if (bfA->numBitVectors > 1)
fatal ("error: \"" + bfFilenames[0] + "\" contains more than one bit vector");
if (bfB->numBitVectors > 2)
fatal ("error: \"" + bfFilenames[1] + "\" contains more than one bit vector");

void BFOperateCommand::op_or()

HowDeSBT/cmd_bf_operate.cc

Lines 278 to 281 in aaab732

if (bfA->numBitVectors > 1)
fatal ("error: \"" + bfFilenames[0] + "\" contains more than one bit vector");
if (bfB->numBitVectors > 2)
fatal ("error: \"" + bfFilenames[1] + "\" contains more than one bit vector");

void BFOperateCommand::op_xor()

HowDeSBT/cmd_bf_operate.cc

Lines 313 to 316 in aaab732

if (bfA->numBitVectors > 1)
fatal ("error: \"" + bfFilenames[0] + "\" contains more than one bit vector");
if (bfB->numBitVectors > 2)
fatal ("error: \"" + bfFilenames[1] + "\" contains more than one bit vector");

void BFOperateCommand::op_eq()

HowDeSBT/cmd_bf_operate.cc

Lines 348 to 351 in aaab732

if (bfA->numBitVectors > 1)
fatal ("error: \"" + bfFilenames[0] + "\" contains more than one bit vector");
if (bfB->numBitVectors > 2)
fatal ("error: \"" + bfFilenames[1] + "\" contains more than one bit vector");

Building a pre-release with new features (--rrr, --unrrr, and --list in bfoperate subcommand)

This is not really an issue.

I'm developing a framework that makes use of HowDeSBT for building SBTs, and for most of the users it would for sure be easier to install HowDeSBT with conda.

Just wondering whether it could be possible to create a new tagged pre-release so that I could easily update the conda recipe to point to a specific tag with the last features (i.e., --rrr, --unrrr, and --list in bfoperate subcommand).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.