Coder Social home page Coder Social logo

sanjaynagi / rna-seq-pop Goto Github PK

View Code? Open in Web Editor NEW
17.0 2.0 6.0 102.03 MB

Snakemake workflow for Illumina RNA-sequencing experiments - extract population genomic signals from RNA-Seq data

Home Page: https://sanjaynagi.github.io/rna-seq-pop/

License: MIT License

Python 1.84% R 0.45% Jupyter Notebook 97.71%
snakemake-workflow rna-seq variant-calling differential-expression population-genomics snakemake transcriptomics workflow population-genetics selection

rna-seq-pop's Introduction

Hello there ๐Ÿ‘‹

I'm Sanjay Curtis Nagi, a researcher studying the major malaria mosquito Anopheles gambiae ๐ŸฆŸ

rna-seq-pop's People

Contributors

sanjaynagi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

rna-seq-pop's Issues

get coverage script

  • check number and proportion of expressed genes
  • check number and proportion of of genes with SNPs called
  • coverage across genome

Problem installing the rna-seq-pop

Hi rna-seq-pop developper,

I am having problem installing this pipeline using conda:
ls
config LICENSE README.md resources RNA-Seq-Pop-Logo.png workflow

snakemake --use-conda --cores 16
**WorkflowErrorin line 13 of /scicomp/home-pure/qlk5/Desktop/important_programs/rna-seq-pop/workflow/Snakefile:
Workflow defines configfile config/config.yaml but it is not present or accessible.
File "/scicomp/home-pure/qlk5/Desktop/important_programs/rna-seq-pop/workflow/Snakefile", line 13, in
**

It looks like we need a file called config.yaml inside of the config folder , which is not exists , see bellow:
ls ../rna-seq-pop/o_config/
exampleconfig.yaml examplefastq.tsv examplesamples.tsv
__
Please, do you have any suggestion?

Cheers,
DD

Adapting workflow to other species

This looks like a comprehensive workflow. I'd like to use it to do DE analysis and variant calling for other species, like non-human malaria. What would I need to do to make that happen?

Change 'chrom' to 'contig'

currently use chrom in workflow to mean chromosome.

contig is more accurate given we work with chromosomal arms in anopheles. and will align with malariagen_data API

Check for signals.tsv

Given it's an optional step - need to check for signals file for ag1000gsweepsDE in checkINputs.py

remove need for DE contrasts rule

the rule which sets up the contrast table is unnecessary.

If its made incorrectly initially, snakemake will not delete it and it will affect future runs until the file is deleted. we could have config.yaml as an input to that rule, but it mean any change to the config.yaml will need all subsequent rules to be re-run.

ideally it will re-generate the DE.contrasts.list file each time, but not need to re-run other rules.

The alternative is to just directly use the config.yaml, which is better i think. We can have a function in tools.py which parses the config.yaml contrasts straight into a table perhaps

expand config.yaml

This needs to include data such as pointing to gene name info, ploidy, and ideally the option to trim sequences

change freebayes min depth param

This needs changing to 4 ideally as previous studies shown variant calling accurate above this level.
already we have missingness filters high so this probably wont make much difference.

Windowed Fst scan

apply a windowed hudson Fst scan in the same manner as PBS currently implemented

allow fastq files to be specified in units.tsv

Make a units.tsv file of 3 columns, sample, read1, read2 which is automatically read into snakemake.

Optionally, allow either rigid (current) or flexible filenaming by using an input function

Change syntax for contrasts

Currently, we use underscores in the config.yaml when specifying a contrast (i.e Kisumu_Tiassale), but this means that underscores cannot be used in treatment names. It's not the end of the world, but this is probably quite a regular use case.

We could use lists i.e ['Kisumu', 'Tiassale'] instead. I would prefer to keep manual specification of contrasts as it allows for the most flexibility.

It would require some minor editing of almost every script in the workflow.

Documentation

Need to fully write up methods and also a more detailed explanation of how to use pipeline and configure things

edit scripts to take snakemake params

  • edit scripts to take snakemake params directly - aka snakemake@input[1], for R , snakemake.input[0] for python
  • change rules to 'script' as opposed to 'shell'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.