Coder Social home page Coder Social logo

imagoxv / nanoasv Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 6.46 MB

NanoASV official repo

License: GNU General Public License v3.0

Dockerfile 9.33% Shell 67.14% R 23.53%
amplicon-sequencing asv bioinformatics bioinformatics-pipeline docker end-to-end metabarcoding metabarcoding-pipeline nanopore nanopore-analysis-pipeline nanopore-minion nanopore-reads nanopore-sequencing singularity workflow

nanoasv's People

Contributors

frederic-mahe avatar imagoxv avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

nanoasv's Issues

Checkpoint system

I need to add a checkpoint system that could allow to resume data analysis in case of error to avoid re-computing

Same, I should add a R_ONLY option to phyloseq data in case the metadata.csv file was not working

Produce phylo tree for phyloseq object

Need to singleline SILVA, directly in dockerfile because, it's the moment.

need to extract reference seq to build tree with fastTree.

Need to feed Rscript with the tree and then inject it in phyloseq object.

Almost there

No control over Porehcop cpu usage.

Porechop uses as many cpu as possible no matter the parallelization I wrote.

It might detect and use whatever is asked. It might even do that in parallel somehow

It seems a pretty lightweight computation still.

Subsampling before chimare detection

Chimera detection seems long for some highly sequenced barcodes.
I need to add a subsampling step before chimera detection, maybe something like --subsampling XX * 2 to allow for some buffer sequences.

Subsampling before chimera detection

Chimera detection seems long for some highly sequenced barcodes.
I need to add a subsampling step before chimera detection, maybe something like --subsampling XX * 2 to allow for some buffer sequences.

First Test on real dataset

First test on real dataset had memory handling issues for some barcodes.
Dataset has to be subsampled

Numerical taxonomic richness seemed to have increased comparing to my previous treatment. I have to investigate.

Not running on aarch64

I cannot make it run pn MK1C because chopper was compiled for amd64

Need to find a way

Binary Realease ?

Hi @frederic-mahe, I tried to make a release to see how it works. However, my binary is too big (~5Gb). Max allowed is 2Gb.

Any idea on how to overcome this ?

Arthur

Need to specify software versions for reproducibility

Need to specify the versions in the docker file installation so it's always the same tool versions used

  • bwa Version: 0.7.17-r1188 (Might consider upgrading to a more recent one
  • Chopper v0.7.0
  • fasttree Only one version ? Might consider using FastTree2
  • MAFFT v7.490 (2021/Oct/30)
  • Porechop-0.2.4
  • R version 4.1.2 (2021-11-01) -- "Bird Hippie"
  • samtools 1.13
  • vsearch v2.21.1_linux_x86_64

I should probably fix the library verisons as well

Need to update bwa to bwa-mem2

I used bwa for simplicity sake, but now I need to change it for bwa-mem2 which is supposed to be more memory efficient and faster

Indexing parallelisation ?

I wonder if possible to parallelize the indexing step (which is clearly the most computer intensive during the build process

Singularity keeps looking for bin outside the container

This drives me crazy.

On the IFB cluster, NanoASV running with singularity :

/shared/software/modules/4.6.1/init/bash: line 37: /usr/bin/tclsh: No such file or directory

The whole purpose of a container is to NOT LOOK OUTSIDE OF IT isn't it ?

Chimera detection #2

Vsearch seems to never detect chimera with default parameters.

I think it lies on the fact that sequences are not dereplicated and therefore do not have a "count" section in fasta header.
However, I think dereplication might not work because vsearch expects 100% similarity. Which is rarely (if not) achieved with nanopore amplicon sequencing.
Efficient dereplication would come from accepting a certain variability threshold that would end up being clustering. Such clustering with vsearch performs well with a --id 0.7. Which is significantly lower than what we would want to accept for dereplication. If clustering, then it's not ASV treatment anymore.

I need to discuss it with you @frederic-mahe

Docker image run with singularity. Error phyloseq R package : "Cannot open shared object"

Error: package or namespace load failed for ‘phyloseq’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/home/imago/R/x86_64-pc-linux-gnu-library/4.3/stringi/libs/stringi.so':
  libicui18n.so.70: cannot open shared object file: No such file or directory
Execution halted

Something might have changed.

I'm pretty sure that's because of ubuntu:latest

Gotta change for ubuntu 22.04

I'm sure at some point it was specified. IDK what happened

Running nanoasv smoothly

It seems that singularity is instantly called when running nanoasv, which makes the following
singularity run nanoasv --options uneccessary. If nanoasv singularity file is executable, then just ./nanoasv or nanoasv is you put it in /opt/ and add it to the $PATH

A nice way to do it

echo 'export PATH=$PATH:/opt/' >> ~/.bashrc && source ~/.bashrc

which makes

~$ nanoasv 
WARNING: could not mount /etc/localtime: not a directory
 ______________________________________
/ Error: -d needs an argument, I don't \
\ know where your sequences are.       /
 --------------------------------------
        \   ^__^
         \  (xx)\_______
            (__)\       )\/\
             U  ||----w |
                ||     ||

Lovely

aarch64 - MK1C fail on minimal dataset

Step 4/9 : Adapter trimming with Porechop
Step 5/9 : Subsampling
Step 6/9 : Reads alignements with bwa against SILVA_138.1
environment: line 1:   218 Segmentation fault      (core dumped) bwa mem ${DB}/SILVA_IDX "${FILE}" 2> /dev/null > "${FILE}.sam"
environment: line 1:   221 Segmentation fault      (core dumped) bwa mem ${DB}/SILVA_IDX "${FILE}" 2> /dev/null > "${FILE}.sam"
Step 7/9 : Skipped - no unknown sequence
Step 8/9 : Phylogeny with MAFFT and FastTree
Step 9/9 : Phylosequization with R and phyloseq
Data treatment is over.
NanoASV took 144 seconds to perform.

This indicates a memory related error.

If only the MK1C was running dozens of useless job in background.

I'll find a way

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.