Coder Social home page Coder Social logo

shamir-lab / faucet Goto Github PK

View Code? Open in Web Editor NEW
18.0 18.0 3.0 129.76 MB

This is the codebase for Faucet, described in our manuscript: https://academic.oup.com/bioinformatics/article/34/1/147/4004871, by Roye Rozov, Gil Goldshlager, Eran Halperin, and Ron Shamir

License: BSD 3-Clause "New" or "Revised" License

C++ 96.81% C 0.56% Makefile 1.86% TeX 0.56% Shell 0.22%
de-novo-assembly metagenomes streaming-algorithms

faucet's People

Contributors

ggoldsh avatar luizirber avatar rozovr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

faucet's Issues

mixed lengths or reads too long

crash occurs when running on file KPN11_interleaved.fa
run as: /usr/bin/time -o test.log ./mink -read_load_file $eran/recycle_paper_data/KPN11_interleaved.fa -read_scan_file $eran/recycle_paper_data/KPN11_interleaved.fa -size_kmer 31 -max_read_length 100 -estimated_kmers 8000000 -file_prefix $eran/mink_data/test --two_hash --paired_ends

has reads up to 250 bp - does max spacer need to be set differently?

junc distances messed up

In tip removal, want to project coverage values of removed tip to remaining (highest coverage) extension up to length of tip. When printing contigJunc values, see odd values for distances (e.g., 184 on 32 bp contig) and I believe having these updates on caused a segfault - will double check.

better documentation & consistency needed in command line options

  • it is unclear if order of options matters
  • for starting with loading the BF or junctions - for one you need to provide the full file name, while for the other just a prefix
  • it is not clear if all command line options are essential for users vs. for developer/debugging use

JunctionMap bug in building linear regions

Segfault arises in running on file from ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA010/SRA010896/SRX016231/SRR034939_2.fastq.bz2
Occurs during contig map construction in building linear regions around the line
140-150 (dep. on what comments are in) "Junction nextJunc = *getJunction(result.kmer);"

Improve readability of extension logic

I think we should define constants A,C,G,T,BACK, to replace extension indices 0,1,2,3,4, in our usage throughout the code. This would make a lot of calls less obscure. In particular BfSearchResult(kmer, true, BACK,2,contig) is a lot clearer than BfSearchResult(kmer,true,4,2,contig). In the latter case the 4,2 could be swapped to 2,4, and a casual reader would have no idea anything had gone wrong.

Another option would be an enum type- worth looking into whether there are drawbacks to that vs. an int. If there are no drawbacks an enum would be even better, as then the compile could catch an error such as swapping BACK and 2 above, since BACK wouldn't be an int.

integration test before pushing

need to aggregate a set of tests and be able to run through all of them whenever a significant change is introduced, i.e., every time a commit is made (or definitely before pushing)

larger k values wanted

k currently limited to 32. Would like to be run with 31, 63, 127 & combine results with e.g., metassembler tool

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.