gabaldonlab / karyon Goto Github PK

View Code? Open in Web Editor NEW

3.0 5.0 1.0 60.28 MB

This is the Karyon Pipeline

License: GNU General Public License v3.0

Shell 11.45% Python 87.56% Dockerfile 0.99%

karyon's Introduction

Karyon

This repository contains the Karyon pipeline.

Introduction

Karyon is a pipeline for the assembly and analysis of highly heterozygous genomes. It uses redundans (Pryszcz & Gabaldón, 2016) to reduce heterozygosity during the assembly process, and then maps the original libraries against the reduced assembly to analyze the distribution of heterozygous regions. With this information, it generates a series of plots that can aid researchers to generate informed hypotheses with regard of the architecture of their genomes.

Scripts contained in this repository:

karyon.py -> complete pipeline, including genome assembly, assembly reduction, SNP calling and plot generation
prepare_libraries.py -> karyon dependency. It uses Trimmomatic () to trim input libraries before genome assembly.
spades_recipee.py -> Karyon dependency. It generates a file that launches dipSPAdes () with the input.
varcall_recipee.py -> Karyon dependency. It generates a file that launches all steps in the SNP calling pipeline.
karyonplots.py -> Karyon dependency. It generates all the plots as part of the Karyon pipeline.
allplots.py -> Standalone version of karyonplots.py. It allows the user to input karyon results to generate the plots again.
nQuire_plot.py -> It allows the user to run the local ploidy plot alone.
Dockerfile -> Docker file involved in building the image to run Karyon
install.sh -> Bash script required to install the remaining dependencies in the dockerfile
redundans_env.yml & busco_env.yml -> Conda environments in YAML format required to install some of the trickiest dependencies.

How to install it

Quick start You can install it using the standard installation or through Docker.

Standard installation

Follow this steps

# First clone the Karyon repository
git clonehttps://github.com/Gabaldonlab/karyon.git
# Change to karyon/scripts directory
cd karyon/scripts/
# Then, run the installation script.
bash installation.sh

Docker installation

In order to run this container you'll need docker installed. Need to get started?

Use the Dockerfile build

# From the karyon git directory
docker build --no-cache -t cgenomics/karyon:1.2 .
# Start the container and indicate a volume and a container name
docker run -dit --name=karyon -v $(pwd):/root/src/karyon/shared --rm cgenomics/karyon:1.2
# Install all the necessary dependencies inside the running container. First run interactively the container
docker exec -it karyon bash
# Changing dir to the karyon volume in the container where the Dockerfile is located
cd /root/src/karyon/shared/
# Run the dependency installation script
bash scripts/docker_install.sh

Pull the docker image from Docker Hub

Pull gabaldonlab/karyon from the Docker repository:

# First pull the image
docker pull cgenomics/karyon:1.2
#  Start the container and indicate a volume and a container name
docker run -dit --name=karyon -v $(pwd):/root/src/karyon/shared --rm cgenomics/karyon:1.2
# Install all the necessary dependencies inside the running container. First run interactively the container
docker exec -it karyon bash
# Changing dir to the karyon volume in the container where the Dockerfile is located
cd /root/src/karyon/shared/
# Run the dependency installation script
bash scripts/docker_install.sh

Test dataset

The test dataset is composed by two sequencing libraries from NCBI SRA corresponding to Lichtheimia ramosa B5399, one of the strains analyzed in the main publication.

# Execute interactively the docker container
docker exec -it karyon bash
# Configure SRA tools within the docker container
~/src/karyon/shared/dependencies/sratoolkit.3.0.0-ubuntu64/bin/vdb-config --interactive
# Download the SRA libraries at the desired location
cd /root/src/karyon/shared/
~/src/karyon/shared/dependencies/sratoolkit.3.0.0-ubuntu64/bin/fastq-dump --split-files SRR974799 SRR974800

Manual

Please, check the manual for a comprehensive use of Karyon

Authors

Miguel Ángel Naranjo Ortiz - Pipeline work - MANaranjo
Manuel Molina Marín - Docker work - manumolina
Diego Fuentes Palacios - Docker work & testing dfupa
Toni Gabaldón - Intellectual design & validation - tgabaldon

License

This project is licensed under the GNU General Public License - see the LICENSE.md file for details.

karyon's People

Contributors

Stargazers

Watchers

Forkers

tuli

karyon's Issues

Extraction of nQuire plotting script

Hi, i think that the nQuire ploidy/CN plots look great. Would you be able to pull them out as a standalone script i could use please?

Ive spent a lot of time trying to get the allplots.py script running but have had no luck yet.

Many thanks

Mike

Running allplots.py

Hi, I really like the nQuire plots included in this, and think those among the others could be really useful. I have the required files for the allplots.py script - vcf, pileup, etc - but cant get it working.

I am getting the following error message:

Traceback (most recent call last):
  File "bin/allplots.py", line 100, in <module>
    config_dict["KAT"][0], 
KeyError: 'KAT'

I am running the following command:

python3 allplots.py \
-f redundans _reduced.fasta \
-v file.vcf -p file.mpileup \
-b file.bam \
-l library_to_make_fasta.fastq \
-o out_file

I have installed with both lightinstall.sh and install.sh and am getting the same message.

Can you help? I will try the docker image int he meantime, many thanks.

Running the variant calling pipeline and plots with existing assembly

Hello,

I would like to use the pipeline for the mapping and variant calling, plotting but I would like to skip the first parts which are the assembly, reduction and proceed directly to the above steps. My assembly is already in a good state. However, bin/varcall_recipee.py does not come with documentation of the required arguments. Are the scripts also as a standalone version or does some one need to run the entire pipeline?

Thanks
Alex

installing and running just the plotting part

Hello, I am trying to install karyon. I am installing it on a cluster environment (without admin rights or apt or apt-get) mostly using conda. I am running into small things I am able to resolve, but I thought I should keep track of them as it might be useful for you.

1 - 6. are just some small observations I run into while going through the manual and trying to get the plot done.
7. is something I would like to bring your attention to, because I am reviewing the manuscript right now and the software should work beforehand.

I think this line in the installation script should be

conda install -c bioconda -y sra-tools

instead. Because as it is, I am getting

conda install: error: argument -c/--channel: expected one argument

I installed GATK4 via conda instead

conda install -c bioconda gatk4
I think sudo chmod 777 nQuire can be simply chmod 777 nQuire (sudo in unnecessary)
At this point, I gave up on most of the software, because all I wanted to generate are the karyon plots for the already assembled genome. It would be handy to have a reduced installation procedure for people that would like to do the genome on their own.
Some parts of the manual pdf are not well typeset - some of the spaces in the code are not there and when one copy-pastes a bit, it will mess up the command. And oddly enough, only some parts of the manual file have this problem.
all_plots.py in the manual should be allplots.py (at least that's the name of a script). Furtheremore, the script requires -p <pileup> file, while manual instruct to specify -m example.mpileup file (thought -p example.mpileup actually did the job). It also screamed when I tried to specify it 2 sequencing files (R1 and R2).
then I finally managed to get it run, but then I got

python3 install/karyon/bin/allplots.py -f data/reference/genome.fa -v freebayes_Afus1_raw.vcf -p Afus1.mpileup -b data/mapped_reads/Afus1.rg.sorted.rmdup.bam -l data/trimmed_reads/Afus1/ERR5959256-trimmed-pair1.fastq.gz -o data/Afus1/karyon_plots
Traceback (most recent call last):
  File "/scratch/kjaron/install/karyon/bin/allplots.py", line 93, in <module>
    allplots(window_size, 
  File "/scratch/kjaron/install/karyon/bin/karyonplots.py", line 496, in allplots
    os.mkdir(kitchen)
FileNotFoundError: [Errno 2] No such file or directory: '/home/karyon/tmp/6X0UH5'

I have not installed all the components, I have just thoseI thought will be necessary for the analysis (nQuire and all the python libraries). This is something I would like to propose as well - could you perhaps create simpler installation instructions for people interested in the ploidy analysis and plots from existing assembly/vcf files?