medvedevgroup / howdesbt Goto Github PK
View Code? Open in Web Editor NEWSequence Bloom Tree, supporting determined/how split filters
License: MIT License
Sequence Bloom Tree, supporting determined/how split filters
License: MIT License
(All of the following is hypothetical work, but there is currently no plan to do this.)
Canonicalization of the hash function by using h(forward)+h(revcomp) may be introducing more collisions. The solution would be to use something like min(h(forward),(revcomp)), though this can skew the distribution of bits.
More testing would be needed to determine whether the current canonicalization is really a problem.
If it is a problem, any solution needs to retain backward compatibility so as to insure compatibility with existing files. The BF file format includes a version number -- newer versions of the program would recognize the earlier file versions and use the current hash function with those. Newly created BF files would use the new hash function. Older versions of the program would reject the new files.
This is actually not an issue, it is more like a request to implement a new feature.
The bfdistance
command automatically compute the pair-wise distance between all the BF files listed in --list=<filename>
. This is great but it is not currently possible to make it parallel and speed up the whole process, which would be extremely helpful when dealing with hundred thousands of BFs. This is probably not straightforward but can be indirectly solved by thinking of another problem.
In particular, let say that I want to compute the distance between one BF only and a set of BFs. This is not currently possible, but it seems super easy to implement since bfdistance
already takes one BF file <filename>
and a list of BF files --list=<filename>
in input. The idea is that it could compute the distance between <filename>
and --list=<filename>
whether both of them are specified in input (currently, it seems ignoring <filename>
if --list=<filename>
is also specified, but I may be wrong).
At this point, there would be no reason to think about a way to make it parallel since we could simply run multiple instances of howdesbt bfdistance
with different <filename>
but all with the same --list=<filename>
.
I noticed that you check now whether the input bloom filters contain only one bit vector each before applying a specific logic operator with the bfoperate
subcommand.
This totally makes sense, but I didn't get why the condition on the number of bit vectors should be > 2
in case of the second bloom filter passed in input (bfB
). Should it probably be fixed with > 1
as well as you already do with the first bloom filter bfA
?
void BFOperateCommand::op_and()
Lines 243 to 246 in aaab732
void BFOperateCommand::op_or()
Lines 278 to 281 in aaab732
void BFOperateCommand::op_xor()
Lines 313 to 316 in aaab732
void BFOperateCommand::op_eq()
Lines 348 to 351 in aaab732
This is not really an issue.
I'm developing a framework that makes use of HowDeSBT for building SBTs, and for most of the users it would for sure be easier to install HowDeSBT with conda.
Just wondering whether it could be possible to create a new tagged pre-release so that I could easily update the conda recipe to point to a specific tag with the last features (i.e., --rrr
, --unrrr
, and --list
in bfoperate
subcommand).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.