Coder Social home page Coder Social logo

karyon's Introduction

Karyon

Latest Version Docker Pulls run with docker

This repository contains the Karyon pipeline.

Introduction

Karyon is a pipeline for the assembly and analysis of highly heterozygous genomes. It uses redundans (Pryszcz & Gabaldón, 2016) to reduce heterozygosity during the assembly process, and then maps the original libraries against the reduced assembly to analyze the distribution of heterozygous regions. With this information, it generates a series of plots that can aid researchers to generate informed hypotheses with regard of the architecture of their genomes.

Scripts contained in this repository:

  1. karyon.py -> complete pipeline, including genome assembly, assembly reduction, SNP calling and plot generation
  2. prepare_libraries.py -> karyon dependency. It uses Trimmomatic () to trim input libraries before genome assembly.
  3. spades_recipee.py -> Karyon dependency. It generates a file that launches dipSPAdes () with the input.
  4. varcall_recipee.py -> Karyon dependency. It generates a file that launches all steps in the SNP calling pipeline.
  5. karyonplots.py -> Karyon dependency. It generates all the plots as part of the Karyon pipeline.
  6. allplots.py -> Standalone version of karyonplots.py. It allows the user to input karyon results to generate the plots again.
  7. nQuire_plot.py -> It allows the user to run the local ploidy plot alone.
  8. Dockerfile -> Docker file involved in building the image to run Karyon
  9. install.sh -> Bash script required to install the remaining dependencies in the dockerfile
  10. redundans_env.yml & busco_env.yml -> Conda environments in YAML format required to install some of the trickiest dependencies.

How to install it

  • Quick start You can install it using the standard installation or through Docker.
  1. Standard installation

Follow this steps

# First clone the Karyon repository
git clonehttps://github.com/Gabaldonlab/karyon.git
# Change to karyon/scripts directory
cd karyon/scripts/
# Then, run the installation script.
bash installation.sh
  1. Docker installation

In order to run this container you'll need docker installed. Need to get started?

  • Use the Dockerfile build
# From the karyon git directory
docker build --no-cache -t cgenomics/karyon:1.2 .
# Start the container and indicate a volume and a container name
docker run -dit --name=karyon -v $(pwd):/root/src/karyon/shared --rm cgenomics/karyon:1.2
# Install all the necessary dependencies inside the running container. First run interactively the container
docker exec -it karyon bash
# Changing dir to the karyon volume in the container where the Dockerfile is located
cd /root/src/karyon/shared/
# Run the dependency installation script
bash scripts/docker_install.sh
  • Pull the docker image from Docker Hub

Pull gabaldonlab/karyon from the Docker repository:

# First pull the image
docker pull cgenomics/karyon:1.2
#  Start the container and indicate a volume and a container name
docker run -dit --name=karyon -v $(pwd):/root/src/karyon/shared --rm cgenomics/karyon:1.2
# Install all the necessary dependencies inside the running container. First run interactively the container
docker exec -it karyon bash
# Changing dir to the karyon volume in the container where the Dockerfile is located
cd /root/src/karyon/shared/
# Run the dependency installation script
bash scripts/docker_install.sh

Test dataset

The test dataset is composed by two sequencing libraries from NCBI SRA corresponding to Lichtheimia ramosa B5399, one of the strains analyzed in the main publication.

# Execute interactively the docker container
docker exec -it karyon bash
# Configure SRA tools within the docker container
~/src/karyon/shared/dependencies/sratoolkit.3.0.0-ubuntu64/bin/vdb-config --interactive
# Download the SRA libraries at the desired location
cd /root/src/karyon/shared/
~/src/karyon/shared/dependencies/sratoolkit.3.0.0-ubuntu64/bin/fastq-dump --split-files SRR974799 SRR974800

Manual

Please, check the manual for a comprehensive use of Karyon

Authors

  • Miguel Ángel Naranjo Ortiz - Pipeline work - MANaranjo
  • Manuel Molina Marín - Docker work - manumolina
  • Diego Fuentes Palacios - Docker work & testing dfupa
  • Toni Gabaldón - Intellectual design & validation - tgabaldon

License

This project is licensed under the GNU General Public License - see the LICENSE.md file for details.

karyon's People

Contributors

dfupa avatar manaranjo avatar manumolina avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

tuli

karyon's Issues

Extraction of nQuire plotting script

Hi, i think that the nQuire ploidy/CN plots look great. Would you be able to pull them out as a standalone script i could use please?

Ive spent a lot of time trying to get the allplots.py script running but have had no luck yet.

Many thanks

Mike

Running allplots.py

Hi, I really like the nQuire plots included in this, and think those among the others could be really useful. I have the required files for the allplots.py script - vcf, pileup, etc - but cant get it working.

I am getting the following error message:

Traceback (most recent call last):
  File "bin/allplots.py", line 100, in <module>
    config_dict["KAT"][0], 
KeyError: 'KAT'

I am running the following command:

python3 allplots.py \
-f redundans _reduced.fasta \
-v file.vcf -p file.mpileup \
-b file.bam \
-l library_to_make_fasta.fastq \
-o out_file

I have installed with both lightinstall.sh and install.sh and am getting the same message.

Can you help? I will try the docker image int he meantime, many thanks.

Running the variant calling pipeline and plots with existing assembly

Hello,

I would like to use the pipeline for the mapping and variant calling, plotting but I would like to skip the first parts which are the assembly, reduction and proceed directly to the above steps. My assembly is already in a good state. However, bin/varcall_recipee.py does not come with documentation of the required arguments. Are the scripts also as a standalone version or does some one need to run the entire pipeline?

Thanks
Alex

installing and running just the plotting part

Hello, I am trying to install karyon. I am installing it on a cluster environment (without admin rights or apt or apt-get) mostly using conda. I am running into small things I am able to resolve, but I thought I should keep track of them as it might be useful for you.

1 - 6. are just some small observations I run into while going through the manual and trying to get the plot done.
7. is something I would like to bring your attention to, because I am reviewing the manuscript right now and the software should work beforehand.

  1. I think this line in the installation script should be

    conda install -c bioconda -y sra-tools

instead. Because as it is, I am getting

conda install: error: argument -c/--channel: expected one argument     
  1. I installed GATK4 via conda instead

    conda install -c bioconda gatk4

  2. I think sudo chmod 777 nQuire can be simply chmod 777 nQuire (sudo in unnecessary)

  3. At this point, I gave up on most of the software, because all I wanted to generate are the karyon plots for the already assembled genome. It would be handy to have a reduced installation procedure for people that would like to do the genome on their own.

  4. Some parts of the manual pdf are not well typeset - some of the spaces in the code are not there and when one copy-pastes a bit, it will mess up the command. And oddly enough, only some parts of the manual file have this problem. Screenshot 2021-07-06 at 17 38 42

  5. all_plots.py in the manual should be allplots.py (at least that's the name of a script). Furtheremore, the script requires -p <pileup> file, while manual instruct to specify -m example.mpileup file (thought -p example.mpileup actually did the job). It also screamed when I tried to specify it 2 sequencing files (R1 and R2).

  6. then I finally managed to get it run, but then I got

python3 install/karyon/bin/allplots.py -f data/reference/genome.fa -v freebayes_Afus1_raw.vcf -p Afus1.mpileup -b data/mapped_reads/Afus1.rg.sorted.rmdup.bam -l data/trimmed_reads/Afus1/ERR5959256-trimmed-pair1.fastq.gz -o data/Afus1/karyon_plots
Traceback (most recent call last):
  File "/scratch/kjaron/install/karyon/bin/allplots.py", line 93, in <module>
    allplots(window_size, 
  File "/scratch/kjaron/install/karyon/bin/karyonplots.py", line 496, in allplots
    os.mkdir(kitchen)
FileNotFoundError: [Errno 2] No such file or directory: '/home/karyon/tmp/6X0UH5'

I have not installed all the components, I have just thoseI thought will be necessary for the analysis (nQuire and all the python libraries). This is something I would like to propose as well - could you perhaps create simpler installation instructions for people interested in the ploidy analysis and plots from existing assembly/vcf files?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.