Coder Social home page Coder Social logo

lskatz / fasten Goto Github PK

View Code? Open in Web Editor NEW
76.0 5.0 6.0 13.96 MB

:construction_worker: Fasten toolkit, for streaming operations on fastq files

License: MIT License

Rust 77.40% Shell 20.45% Dockerfile 0.57% TeX 0.96% Makefile 0.62%
bioinformatics fastq-files rust

fasten's Issues

[suggestion] Document, eventually enforce, input format

I would specify when a tool supports FASTA and FASTQ or only FASTQ as input, including in the description (the --help page).

It's self-explanatory to have quality-related functions not working on FASTA files. Still, some tools like fasten_sort do not work on FASTA file, but there's no evident mention and no input validation.

I understand that sometimes the philosophy of keeping it simple can be the right thing to do; in this case, just a line in the helpfile would suffice :)

fastq_mutate should always give -m mutations?

for fasten_mutate is not clear if the number of SNPs should be guaranteed or it's just a maximum.

From the tests it looks like the second:

for i in {1..30}; 
do
  ./target/debug/fasten_mutate < testdata/four_reads.fastq  --snps 5 -m | awk 'NR % 4 == 2' | \
   sed 's/[a-z]//g' | perl -ne 'chomp; if (length($_)!=5){die "Error: unexpected SNPs count: $_\n"}'; 
done

If this is the intended behaviour I would specify so in the help page

 -s, --snps INT      Number of SNPs (point mutations) to include per read.

linked to #20

Feature Request: handle seqlen != quallen

Can you explicitly state (perhaps in readme or help menu) which tests are used for FastQ validation in fasten_validate? As far as I can tell in the src the seqlen and quallen aren't compared, or am I wrong?

I've only ever seen 2 options to handle this issue when Field 2 and Field 4 lengths aren't the same:

  1. repair:: trim to the lowest length
  2. clean/remove: discard the read entirely

Both options some might find useful in something like --seq-and-qual-len-diff [repair,remove] but I think remove is the safer more ideal option for quality concerns if only 1 can be implemented. The remove creates a broken sister read pair, so it might be tougher to implement.

JOSS Review

Hello pips,
I open this issue to keep track of the review.

Benchmark: version of the tools
I started reproducing the benchmark and wanted to know if you could share the exact version of the programs used in your benchmark. I made an environment with the latest version of Fasten, SeqTK, Seqkit, SeqFu, fastx but maybe it's better to synchronise this (and to mention the versions used in the paper)


openjournals/joss-reviews#6030

Clarify README summary

I think the summary would help potential users understand the package better if "random operations" and "secure your analysis" were reworded/expanded to be a bit more clear.

This covers the "statement of need" for documentation in the JOSS checklist: openjournals/joss-reviews#6030

Consider std::io::Stdin instead of File('/dev/stdin')

I'm not sure if there's a performance difference (in which case please feel free to add a comment and disregard this), but I think opening /dev/stdin instead of using std::io::Stdin might limit the platforms fasten works on to only POSIX ones (i.e. probably not Windows). Not a high priority, but an easy search/change and probably a good beginner bug.

fastq/fastq mergepe

Hi team,

Just wonder there can be a merge command like seqtk mergepe to interleave and deinterleave fastq/fasta files. I imagine it can be very useful.

Thanks,

Jianshu

add progress meter

A simple script to read in reads and print them out again but print something to stderr

fasten_trim reporting what it trimmed

Have an option to write to a file or files what fasten_trim removes. Maybe an output directory since it will be undoubtedly just be intermediate files.

use subcommand instead of so many commands

Hi fasten team,

In some cases where I have limited disk space for default rust binary install path, I have to compile it and put binary into a user path, I find it to be not so convenient for so many commands. Is it possible to use subcommand for each commands, just like what seqkit did, I feel this way it is more natural. Also clap 4.5 can be use for such a case.

Thanks,

Jianshu

docs on what fasten_validate does?

I'm curious as to what fasten_validate checks on fastq files. I think it would be useful to have a little blurb or bullet points on what is validated when running that tool.

Just a thought, nothing urgent!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.