Coder Social home page Coder Social logo

artificial_metagenomes_variablesize's Introduction

Artificial_metagenomes_variableSize

Pipeline for the generation of artificial metagenomes containing a variable number of organisms. Parrallelized for HPC.

The complete pipeline will generate K artificial metagenomes of a given sequencing depth (Nb of reads). The number of organisms in each metagenome is randomly chosen between a min and a max (defined by the user). IF SQUEUE is set to TRUE : the abundance distribution will be squeued. 1/5 of the organisms will represent around 60% of the total population. If SQUEUE is set to FALSE : The abundance of each organisms in the metagenome is obtained by dividing the total number of reads by the number of organisms, the addition of a gaussian noise and a normalization.

Requirements

Python 3

This pipeline necessites Python3 as well as several packages : -[numpy] (https://www.numpy.org/) -[Sklearn] (https://scikit-learn.org/stable/)

GemSim

This pipeline necessites to download and install GemSim on the HPC.

Quick start

Edit scripts/config.sh file

please modify the following attributes

  • GEMSIM : path to the gemsim directory
  • LIST_GENOMES : list of genomes that will constitute the artificial metagenomes
  • GENOMES_DIR : path to the directory where the genomes are stored, Beware GemSim won't work if this is not relative to $GEMSIM
  • RESULT_DIR : path to the output directory
  • REL_OUT : relative path to the output dir for GEMSIM

parameters

  • NB_READS : nb reads to generate

  • NB_METAGENOMES : nb metagenomes to generate

  • MODEL : error/length model to use

  • MINI : nb minimum of organisms in the metagenomes

  • MAXI : nb maximum of organisms in the metagenomes

  • SQUEUE : If set to TRUE, the abundance distribution will be squeued. 1/5 of the organisms will represent around 60% of the total population. If set to FALSE, the abundance distribution of the organisms will be almost homogeneous.

  • OUTNAME : indicate here the name to use for the output files

  • MAIL_USER : indicate here your arizona.edu email

  • GROUP : indicate here your group affiliation

You can also modify

  • BIN = change for your own bin directory.
  • MAIL_TYPE = change the mail type option. By default set to "bea".
  • QUEUE = change the submission queue. By default set to "standard".

Run pipeline

Run

./submit.sh

This command will place two jobs in queue.

artificial_metagenomes_variablesize's People

Contributors

aponsero avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.