Coder Social home page Coder Social logo

khp-informatics / ngseasy Goto Github PK

View Code? Open in Web Editor NEW
86.0 26.0 40.0 45.09 MB

Dockerised Next Generation Sequencing Pipeline (QC, Align, Calling, Annotation)

License: GNU General Public License v2.0

Shell 93.76% Makefile 6.19% TeX 0.05%
ngs docker next-generation-sequencing pipeline shell

ngseasy's Introduction

NGSeasy_logo

	I am looking for collaborators (dev help). Please get in contact

NGSeasy: A Dockerized NGS pipeline and tool-box

Join the chat at https://gitter.im/KHP-Informatics/ngseasy DOI

NGSeasy

With NGSeasy you can now have full suite of NGS tools up and running on any high end workstation in an afternoon

We present NGSeasy (Easy Analysis of Next Generation Sequencing), a flexible and easy-to-use NGS pipeline for automated alignment, quality control, variant calling and annotation. The pipeline allows users with minimal computational/bioinformatic skills to set up and run an NGS analysis on their own samples, in less than an afternoon, on any operating system (Windows, iOS or Linux) or infrastructure (workstation, cluster or cloud).

Authors: Stephen J Newhouse and Amos Folarin
Release Version: 1.0-r001
Release: dirty_tango
Citation: Folarin AA, Dobson RJ and Newhouse SJ. NGSeasy: a next generation sequencing pipeline in Docker containers [version 1; referees: 3 approved with reservations] F1000Research 2015, 4(ISCB Comm J):997 (doi: 10.12688/f1000research.7104.1).

Acknowledgements

This work is funded by the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London.

  • Lets us know if you want other tools added to NGSeasy

NGSeasy is completely open source and we encourage interested folks to jump in and get involved in the dev with us.

Contributing to NGSEASY

  • Fork it!
  • Create your feature branch: git checkout -b my-new-feature
  • Commit your changes: git commit -am 'Add some feature'
  • Push to the branch: git push origin my-new-feature
  • Submit a pull request!

NGSeasy: Genome Comparison & Analytic Testing (GCAT) Reports

Here we provide a quick look at basic NGSeasy performance (more results coming soon).

GCAT Report Test Data Pipeline
NGSEASY-NTRIM-BWA-FREEBAYES-D illumina-100bp-pe-exome-150x fastq > bwa > freebayes
NGSEASY-NTRIM-BWA-PLATYPUS-D illumina-100bp-pe-exome-150x fastq > bwa > platypus

An example of the run commands:

ngseasy -c ngseasy_test.config.freebayes.tsv -d /media/Data/ngs_projects
ngseasy -c ngseasy_test.config.platypus.tsv  -d /media/Data/ngs_projects

Author Contact Details

Please contact us for help/guidance on using the beta release.

Author email Twitter LinkedIn
Dr Stephen J Newhouse [email protected] @s_j_newhouse View Steve's profile on LinkedIn
Dr Amos Folarin [email protected] @amosfolarin View Amos's profile on LinkedIn

Issues, Questions and Queries

Please Direct all queries to [https://github.com/KHP-Informatics/ngseasy/issues]

When sending bug reports etc please provide:-

  • Date of Download
  • OS and version
  • Basic Machine Specs (CPU, RAM)
  • Docker Version
  • Network Speed (Testing Internet Connection Speed)
  • The Code you ran eg:- ngseasy -c my.config.tsv -d /My/Dir
  • your config file
  • The exact error as printed to screen

WARNING! NGSeasy is not numpty or bad data proof!

Please read the docs, stay calm, take your time and think about what you are doing...and if [www.google.com] doesnt help, then please direct all queries to [https://github.com/KHP-Informatics/ngseasy/issues].

Install Docker

Full instructions at https://docs.docker.com/.

Some fixes to make life easy...allows you to run docker without sudo.

This may differ for your OS, and mostly applies to flavours of Linux. Check with your sys admin or just Google https://www.google.com.

MAC/Windows users using http://boot2docker.io/ should be fine. Read the docs or just Google https://www.google.com.

Create a docker group

sudo addgroup docker

Add user to docker group

Here user is ec2-user

sudo usermod -aG docker ec2-user

Log out and log back in.

This ensures your user is running with the correct permissions.

Verify your work by running docker without sudo.

docker run hello-world

..this is what you should get...

Unable to find image 'hello-world:latest' locally
Pulling repository hello-world
91c95931e552: Download complete
a8219747be10: Download complete
Status: Downloaded newer image for hello-world:latest
Hello from Docker.
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (Assuming it was not already locally available.)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

For more examples and ideas, visit:
 http://docs.docker.com/userguide/

Docker Security...

This post reviews the various security implications of using Docker to run applications within containers, and how to address them: How Secure are Containers?

Docker containers are, by default, quite secure; especially if you take care of running your processes inside the containers as non-privileged users (i.e. non root).

Get NGSeasy

#############################################
## Get NGSeasy                             ##
#############################################

cd /home/${USER}

git clone https://github.com/KHP-Informatics/ngseasy.git

Install NGSeasy

  • Default install directory is /home/${USER}
  • in this example user home is /home/ec2-user
  • make INSTALLDIR="/home/ec2-user" all
    • sets up top level directory structure
    • gets all docker images
    • gets indexed hg19 and b37 genomes
    • gets GATK recources for hg19 and b37 genomes
    • gets whole genome and exome test data
  • Always set your INSTALLDIR : If you run sudo make all the install path will be /home/root. Please dont do this!
  • sudo make install installs scripts to /usr/local/bin/
#############################################
## install NGSeasy                         ##
#############################################

cd ngseasy

## 1.
make INSTALLDIR="/home/ec2-user" all

## 2.
sudo make install

Installation can take a while, 1-2 hours, so go get a coffee../just chill...if your network is bad...then who knows how long...still..just chill...or go get fast internet!

NGSeasy Security

All NGSeasy applications are run as the non-root user pipeman within each container

Recommended Network Speed

> 500 Mbit/s : anything less will add a lot of time to set up (days - weeks).

Testing Internet Connection Speed

source : http://askubuntu.com/questions/104755/how-to-check-internet-speed-via-terminal

wget -O speedtest-cli https://raw.github.com/sivel/speedtest-cli/master/speedtest_cli.py
chmod +x speedtest-cli
./speedtest-cli
Retrieving speedtest.net configuration...
Retrieving speedtest.net server list...
Testing from Comcast Cable (x.x.x.x)...
Selecting best server based on ping...
Hosted by FiberCloud, Inc (Seattle, WA) [12.03 km]: 44.028 ms
Testing download speed........................................
Download: 32.29 Mbit/s
Testing upload speed..................................................
Upload: 5.18 Mbit/s

Install time on Amazon EC2

Connection Speed: ~ 800 Mbit/s

real    94m54.237s
user    12m26.960s
sys     28m46.648s

Note: We have only tested NGSeasy installation on Amazon EC2, Openstack and UK University Networks. These are all fairly fast networks with speeds exceeding 800 Mbit/s on average.

Running NGSeasy for the first time on the test data

Important! NGSeasy is controlled from a single config file. See ngseasy_test.config.tsv for a basic template. It is important that the user sets this up properly before running NGSeasy.

#############################################
## 0. Move to config file dir

cd /home/ec2-user/ngs_projects/config_files/

#############################################
## 1. Run basic test

ngseasy -c ngseasy_test.config.tsv -d /home/ec2-user/ngs_projects

What should happen...

This runs the following basic pipeline on Whole Exome PE 30x Illumina data, aligning to b37 (in theory...give it a try).

  • FastQC > Trimmomatic > BWA > Platypus

Some notes and pointers

  • Edit NCPU in [ngseasy_test.config.tsv] to suit your system
  • Edit PROJECT_DIR in [ngseasy_test.config.tsv] to suit your install path
  • We expect the user to palce all raw fastq files in raw_fastq. NGSeasy uses this as a stagging area for new project and sample data.
  • right now, always run ngseasy from the location/directory that contains the config.file
  • each component of ngseasy can be run as a standalone script

NGSeasy (Easy Analysis of Next Generation Sequencing)

We present NGSeasy (Easy Analysis of Next Generation Sequencing), a flexible and easy-to-use NGS pipeline for automated alignment, quality control, variant calling and annotation. The pipeline allows users with minimal computational/bioinformatic skills to set up and run an NGS analysis on their own samples, in less than an afternoon, on any operating system (Windows, iOS or Linux) or infrastructure (workstation, cluster or cloud).

NGS pipelines typically utilize a large and varied range of software components and incur a substantial configuration burden during deployment which limits their portability to different computational environments. NGSeasy simplifies this by providing the pipeline components encapsulated in Docker™ containers and bundles in a wide choice of tools for each module. Each module of the pipeline represents one functional grouping of tools (e.g. sequence alignment, variant calling etc.).

Deploying the pipeline is as simple as pulling the container images from the public repository into any host running Docker. NGSeasy can be deployed on any medium to high-end workstation, high performance computer cluster and compute clouds (public/private cloud computing) - enabling instant access to elastic scalability without investment overheads for additional compute hardware and makes open and reproducible research straight forward for the greater scientific community.

Advantages

  • Easy to use for non-informaticians.
  • All run from a single config file that can be made in Excel.
  • User can select from mutiple aligners, variant callers and variant annotators
  • No scary python, .yaml or .json files...just one simple Excel workbook saved as a textfile.
  • Just follow our simple set of instructions and NGS away!
  • Choice of aligners and variant callers and anntators
  • Allows reproducible research
  • Version controlled for auditing
  • Customisable
  • Easy to add new tools
  • If it's broke...we will fix it..
  • Enforced naming convention and directory structures
  • Allows users to run "Bake Offs" between tools with ease

We have adapted the current best practices from the Genome Analysis Toolkit (GATK, http://www.broadinstitute.org/gatk/guide/best-practices) for processing raw alignments in SAM/BAM format and variant calling. The current workflow, has been optimised for Illumina platforms, but can easily be adapted for other sequencing platforms, with minimal effort.

As the containers themselves can be run as executables with pre-specified cpu and RAM resources, the orchestration of the pipeline can be placed under the control of conventional load balancers if this mode is required.

Genomes

Genome Build Status
hs37d5 coming soon
b37 available
hg19 available
hs38DH coming soon

Overview of the NGSeasy Pipeline Components

The basic pipeline contains all the basic tools needed for manipulation and quality control of raw fastq files (ILLUMINA focused), SAM/BAM manipulation, alignment, cleaning (based on GATK best practises [http://www.broadinstitute.org/gatk/guide/best-practices]) and first pass variant discovery. Separate containers are provided for indepth variant annotation, structural variant calling, basic reporting and visualisations.

ngsEASY

A Special note on the NGSeasy base image.

We include the following - what we think of as - NGS Powertools in the compbio/ngseasy-base image. These are all tools that allow the user to slice and dice BED/SAM/BAM/VCF files in multiple ways.

  1. samtools
  2. bcftools
  3. vcftools
  4. vcflib
  5. bamUtil
  6. bedtools2
  7. ogap
  8. samblaster
  9. sambamba
  10. bamleftalign
  11. seqtk
  12. parallel

This image is used as the base of all our compbio/ngseasy-* tools.

Why not a separate containers per application? The more docker-esque approach, would be to have separate containers for each NGS tool. However, this belies the fact that many of these tools interact in a deep way. Therefore, we built these into a single development environment for ngseasy, to allow pipes and streamlined system calls for manipulating the output of NGS pipelines (BED/SAM/BAM/VCF files).


The Full NGSeasy pipeline

The NGSeasy pipelines implement the following :-

For academic users and/or commercial/clinical groups whom have paid for GATK licensing, the next steps are to perform

For the non-GATK version

Note Some of the later functions i.e. variant annotation and qc reporting are still in dev.

We highly recommed read trimming prior to alignment. We have noticed considerable speed-ups in alignmnet time and increased quality of SNP/INDEL calls using trimmed vs raw fastq.

Base quality score recalibration is also recommended.
As an alternative to GATK, we have added fucntionality for use of BamUtil:recab for base quality score recalibration.

Non-GATK users

  • are encouraged to use aligners such as stampy and novoalign that perform base quality score recal on the fly.
  • are encouraged to use variant callers that perform local re-aligmnet around candidate sites to mitigate the need for the indel realignment stages.

Dockerised NGS Tools

All NGSeasy Docker images can be pulled down from compbio Docker Hub or using the Makefile.
We provide an Amazon EBS data volume with indexed genomes: XXXXXX

Table 1. NGSeasy Tools

Docker Image Version NGS Tool (version) Short Description URL
compbio/ngseasy-base 1.0-r001 VCFtools (v0.1.12b) manipulate vcf link
- - vt (latest) manipulate vcf link
- - bcftools (1.2-5-g7fa0d25) manipulate vcf link
- - vcflib (v1.0.0) manipulate vcf link
- - samtools (1.2-17-ge91985a) manipulate sam/bam link
- - samblaster (0.1.21) manipulate sam/bam link
- - sambamba (v0.5.1) manipulate sam/bam link
- - bamUtil (1.0.13) manipulate sam/bam link
- - bedtools (v2.23.0-10-g447cb97) manipulate bed files link
- - seqtk (1.0-r77-dirty) manipulate fastq link
- - vawk (0.0.2) manipulate vcf link
- - bioawk (latest) manipulate sam/bam/vcf link
compbio/ngseasy-fastqc 1.0-r001 fastqc (v0.11.2) FASTQ Quality Control Plots link
compbio/ngseasy-trimmomatic 1.0-r001 trimmomatic (0.32) FASTQ Quality Trimming link
compbio/ngseasy-bwa 1.0-r001 bwa ( 0.7.12-r1039) Aligner link
compbio/ngseasy-stampy 1.0-r001 stampy (stampy-1.0.27) Aligner link
compbio/ngseasy-snap 1.0-r001 snap-aligner (1.0beta.18) Aligner link
compbio/ngseasy-bowtie2 1.0-r001 bowtie2 (2.2.4) Aligner link
compbio/ngseasy-novoalign 1.0-r001 novoalign (3.02.13) Aligner link
compbio/ngseasy-gatk 1.0-r001 gatk (3.4-0) NGS PowerTools link
compbio/ngseasy-picardtools 1.0-r001 picardtools (1.128) NGS PowerTools link
compbio/ngseasy-glia 1.0-r001 glia (latest) NGS local realignment link
compbio/ngseasy-platypus 1.0-r001 platypus (0.8.1) Variant Caller link
compbio/ngseasy-freebayes 1.0-r001 freebayes (v0.9.21-19-gc003c1e) Variant Caller link

to add

  • ABRA

Running an NGSeasy Tool Interactively

Run as non-root user pipeman.

-v /media/Data:/home/pipeman : Mounts local directory /media/Data to container directory /home/pipeman

TOOL="bwa"

docker run \
-P \
-w /home/pipeman \
-e HOME=/home/pipeman \
-e USER=pipeman \
--user pipeman \
-v /media/Data:/home/pipeman \
-it compbio/ngseasy-${TOOL}:1.0 /bin/bash

Dockerised NGSeasy

docker

The following section describes getting the Dockerised NGSeasy Pipeline(s) and Resources, project set up and running NGSeasy.

Getting all resources and building required tools will take a few hours depending on network connections and any random "ghosts in the machine" - half a day in reality. But once you're set up, thats it - you are good to go.

System Requirements

See Table System Requirements for our recommended system requirements.NGSeasy will run on any modern computer/workstation or cloud infrastructure. The Hard Disk requirements are based on our experience and result from the fact that the pipeline/tools produce a range of intermediary and temporary files for each sample.

The full NGSeasy install includes indexed genomes for hg19 and b37 for all aligners, annotation files from GATK resource, and all of the NGSeasy docker images. Additional disk space is needed if the user wishes to install the databases associated with the variant annotators, Annovar, VEP and snpEff.

Based on our experience, a functional basic NGS compute system for a small lab, would consist of at least 4TB disk space, 60GB RAM and at least 32 CPU cores. Internet speed and network connectivity are a major bottle neck when dealing with NGS sized data, and groups are encouraged to think about these issues before embarking on multi sample or population level studies - where compute requirements can very quickly escalate.

System Requirements

Component Minimum Recommended
RAM 16GB 48-60GB
CPU 8 cores 16-36 cores
Hard Disk (per sample) 50-100GB 200-500GB
NGSeasy Install 200GB 500GB
Annotation Databases 500GB >1TB

Installing Docker

Follow the simple instructions in the links provided below

A full set of instructions for multiple operating systems are available on the Docker website.

Getting NGSeasy

We provide a simple Makefile to pull all of the public nsgeasy components, scripts and set up to correct project directory structre on your local machines.

Setting up the initial project can take up a day, depending on your local network connections and speeds.

The default install dir is the users ${HOME} directory. The Makefile provides options to install to any user defined directory and select NGSeasy version. eg :-

## EG. Installing to /media/scratch
make INSTALLDIR="/media/scratch" VERSION="1.0" all

The Makefile also allows installation of selected components (check out its insides!).

Set up NGSeasy Project configuration file

Using Excel or something, make a [config.file.tsv] file and save as [TAB] a Delimited file with .tsv extenstion. This sets up Information related to: Project Name, Sample Name, Library Type, Pipeline to call, NCPU.

We provide a template that can be used with NGSeasy, see ngseasy_test.config.tsv.

The [config.file.tsv] should contain the following 23 columns for each sample to be run through a pipeline:-

Variable type Description Options(Examples)
PROJECT_ID STRING Project ID Cancer
SAMPLE_ID STRING Sample ID SAMPLE_I
FASTQ1 STRING Read 1 Fastq foo_R1.fq.gz
FASTQ2 STRING Read 2 Fastq foo_R2.fq.gz
PROJECT_DIR STRING ngseasy project dir /media/scratch/ngs_projects
DNA_PREP_LIBRARY_ID STRING NGS Library
NGS_PLATFORM STRING NGS Platform ILLUMINA
NGS_TYPE STRING NGS Type WEX (exome), WGS (genome), TGS (targeted)
BAIT STRING bait bed file FOO.bed
CAPTURE STRING Capture bed file BAR.bed
GENOMEBUILD STRING genome verison hg19, b37 , b38 (coming soon)
FASTQC STRING Select fastqc no-fastqc, qc-fastqc
TRIM STRING Select trimming no-trimm, atrimm, btrimm
BSQR STRING Select BSQR no-bsqr, bam-bsqr, gatk-bsqr
REALN STRING Select Realignment no-realn, bam-realn, gatk-realn
ALIGNER STRING Select Aligner no-aln, bwa, stampy, snap, novoalign, bowtie2
VARCALLER STRING Select Variant Caller no-varcall, freebayes, platypus, UnifiedGenotyper, HaplotypeCaller, ensemble
CNV STRING Select CNV caller no-sv,all-sv,lumpy,delly,slope,exomedepth,mhmm,cnvnator
ANNOTATOR STRING Select variant annotator no-anno,snpeff,annovar,vep
CLEANUP STRING clean up temp files TRUE, FALSE
NCPU NUMBER number of cores 1 .. N
VERSION NUMBER NGSeasy version 1.0
NGSUSER STRING user email [email protected]

The NGSeasy project directory

The user needs to make the relevent directory structures on their local machine before starting an NGS run.

On our sysetm we typically set up a top-level driectory called ngs_projects within which we store output from all our individual NGS projects.

Here we are working from local top level directory called media/, but this can really be any folder on your local system ie your home directory ~/${USER}.

Within this directory media we make the following folders: -

ngs_projects  
|  
|__raw_fastq  
|__config_files  
|__ngseasy_resources  
   |  
   |__reference_genomes_b37  
   |__reference_genomes_hg19

Running the script make XXXX ensures that all relevant directories are set up, and also enforces a clean structure to the NGS project.

Within this we make a raw_fastq folder, where we temporarily store all the raw fastq files for each project. This folder acts as an initial stagging area for the raw fastq files. During the project set up, we copy/move project/sample related fastq files to their own specific directories. Fastq files must have suffix and be gzipped: _1.fq.gz or _2.fq.gz
furture version will allow any format

Running ngseasy with the relevent configuration file, will set up the following directory structure for every project and sample within a project:-

.
ngs_projects  
|  
|__raw_fastq  
|__config_files  
|__run_logs
|__ngseasy_resources
|
|__ project_id  
	|  
	|__run_logs  
	|__config_files  
	|
	|__sample_id_1  
	|	|  
	|	|__fastq  
	|	|__tmp  
	|	|__alignments  
	|	|__vcf  
	|	|__reports  
	|	|__config_files  
	|
	|
	|__sample_id_n  
		|  
		|__fastq  
		|__tmp  
		|__alignments  
		|__vcf  
		|__reports  
		|__config_files  

The raw_fastq Directory

The raw_fastq Directory is a very special directory indeed. This is where the user should copy and or move ALL NEW RAW FASTQ Files to. This is to be used as an intial staging area for all fastq files. NGSeasy expects all raw fastq data to be placed here for all new samples or runs. NGSeasy inspects this folder and looks for the fastq file names specified in your confifg file. If NGSeasy doen't find them, then it exits. We do this to force the user to get organised.


Manually Build required NGSeasy Container Images

Work In Progress...

Currently we are not able to automatically build some of the tools in pre-built docker containers due to licensing restrictions.

Some of the software has restrictions on use particularly for commercial purposes. Therefore if you wish to use this for commercial purposes, then you leagally have to approach the owners of the various components yourself!

Software composing the pipeline requiring registration:-

These tools require manual download and registration with the proivder. For non-academics/commercial groups, you will need to pay for some of these tools.

Once you have paid/registered and downloaded the tool, we provide scripts and guidance for building these tools on your system.

Its as easy as:-

docker build -t compbio/ngseasy-${TOOL} .

Building NOVOALIGN

Download Novoalign from http://www.novocraft.com/ into the local build directory *ngseasy/containerized/ngs_docker_debian/ngs_aligners/ngseasy_novoalign. Edit the Dockerfile to relfect the correct version of novoalign.

To use all novoalign fucntionality, you will need to pay for a license.

Once you obtained your novoalign.lic, download this to the build directory *ngseasy/containerized/ngs_docker_debian/ngs_aligners/ngseasy_novoalign, which now should contain your updated Dockerfile.

# move to ngseasy_stampy folder
cd ngseasy/containerized/ngs_docker_debian/ngs_aligners/ngseasy_novoalign
ls

the directory should contain the following:-

Dockerfile
novoalign.lic
README.md
novosortV1.03.01.Linux3.0.tar.gz
novocraftV3.02.08.Linux3.0.tar.gz

build novoalign

# build
docker build -t compbio/ngseasy-novoalign:v1.0 .

Building GATK

You need to register and accept the GATK license agreement at https://www.broadinstitute.org/gatk/.

Once done, download GATK and place in the GTAK build directory ngseasy/containerized/ngs_docker_debian/ngs_utils/ngseasy_gatk.

Edit the Dockerfile to relfect the correct version of GATK.

# move to ngseasy_gatk folder
cd ngseasy/containerized/ngs_docker_debian/ngs_utils/ngseasy_gatk
ls

the directory should contain the following:-

Dockerfile
README.md
GenomeAnalysisTK-3.3-0.tar.bz2

build gatk

# build
docker build -t compbio/ngseasy-gatk:v1.0 .

Manually Build NGSeasy Variant Annotaion Container Images

The tools used for variant annotation use large databases and the docker images exceed 10GB. Therefore, the user should manually build these container images prior to running the NGS pipelines. Docker build files (Dockerfile) are available for

Note Annovar requires user registration.

Once built on the user system, these container images can persist for as long as the user wants.

Large Variant Annotation Container Images

Its as easy as:-

docker build -t compbio/ngseasy-${TOOL} .

Build VEP

cd /media/ngs_projects/nsgeasy/ngs/containerized/ngs_docker_debian/ngseasy_vep

sudo docker build -t compbio/ngseasy-vep:${VERSION} .

Build Annovar

cd /media/ngs_projects/nsgeasy/ngs/containerized/ngs_docker_debian/ngseasy_annovar

sudo docker build -t compbio/ngseasy-annovar:${VERSION} .

Build snpEff

cd /media/ngs_projects/nsgeasy/ngs/containerized/ngs_docker_debian/ngseasy_snpeff

sudo docker build -t compbio/ngseasy-snpeff:${VERSION} .

Coming Soon

Useful Links


(C) 2015 Kings College London


Development funded as part of:
NIHR Maudsley Biomedical Research Centre (BRC), King's College London and the
Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London
PHI Data Lab:
Institute of Psychiatry, Psychology & Neuroscience,King's College London.

ngseasy's People

Contributors

afolarin avatar alfredokcl avatar biocyberman avatar dalloliogm avatar fjrmoreews avatar gitter-badger avatar sjnewhouse avatar snewhouse avatar sqvdusers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ngseasy's Issues

permission denied in fastqc container

when running ngseasy -c config.tsv -d /home/bob/ngs_projects the sample fastq folder doesn't have any fastqc results.. and trimmomatic step returns permission denied exception:

Exception in thread "main" java.io.FileNotFoundException: /home/pipeman/ngs_projects/TEST_PROJECT/TEST_SAMPLE/fastq/illumina.100bp.pe.wex.30x_1.fastq.gz (Permission denied)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:146)
    at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:127)
    at org.usadellab.trimmomatic.TrimmomaticPE.process(TrimmomaticPE.java:251)
    at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:498)
    at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:35)

If you look at the fastqc docker container output it has something like this:

$ docker logs 73ea98e292ad
Skipping '/home/pipeman/ngs_projects/TEST_PROJECT/TEST_SAMPLE/fastq/illumina.100bp.pe.wex.30x_1.fastq.gz' which didn't exist, or couldn't be read
Skipping '/home/pipeman/ngs_projects/TEST_PROJECT/TEST_SAMPLE/fastq/illumina.100bp.pe.wex.30x_2.fastq.gz' which didn't exist, or couldn't be read

So no fastqc output was generating.. after checking the file permissions I noticed that ngseasy_initiate_fastqc runs chmod -R 776 ${sout}/* at the very end which means that fastq sample folder has no "x" permission before the fastqc step is run, so the fastqc step is failed...

I've also noticed that some other ngseasy steps (like bsqr and variant calling) do chmod -R 777 ${OUT}/* at the end so that after pipeline is finished all permissions are fine and it very hard to track where the problem is.. and why fastqc and trimmomatic are failed with permission denied..

INSTALLDIR

Just a minor quibble but might trip users up is that in the Readme, there is a misspelling of INSTALLDIR. The Readme states "INTSALLDIR" is referenced but the Makefile expects a "INSTALLDIR".

ngseasy_initiate_project creates 'recursive' directories

Hello,
I've just noticed that after running ngseasy_initiate_project script a 'recursive' project structure is created... like path to my ngs_projects is recreated relatively from ngs_projects,
e.g. /home/bob/ngs_projects/home/bob/ngs_projects/my_project

line 223 in ngseasy_initiate_project script prints $5/$1 from a tsv config which means: /home/bob/ngs_projects/my_project

line 230 in ngseasy_initiate_project script creates a directory: "${project_directory}/${PROJECTNAME}" which is "/home/bob/ngs_projects/home/bob/ngs_projects/my_project" according to script..

Also "-f" option in line 233 doesn't work for me...

ngseasy stopping in ngseasy_variant_calling step

I downloaded and installed ngseasy.
The first steps seem to be running fine, but I get a warning saying that the config file doesn't exists, and the pipeline stops after a few seconds.

The file /home/ubuntu/ngs_projects/config_files/ngseasy_test.config.tsv exists, and it is a copy of your test config file, downloaded during the make testdata step. I had to edit the file because it contained some ^M characters.

sudo ngseasy -c /home/ubuntu/ngs_projects/config_files/ngseasy_test.config.tsv -d /home/ubuntu/ngs_projects/ -p 1 -f 1
http://pastebin.com/wNnzRfbv

Cleanup RUNs don't reduce image size

Just wanted to point out that deleting temporary files in a Dockerfile doesn't reduce image size — unless you delete them in the same RUN that created them.

This is because each RUN command builds a separate image layer. After a layer has been built, any files that it added to the image are "baked in" and will contribute to final image size even if you delete them in a subsequent layer. In such a case, the deletion only serves to make files inaccessible in the running container.

The solution is to execute downloads/installs and cleanups in the same RUN, so only the net result gets baked into the image. Files that were added then removed in the same RUN are completely excluded from the single resulting layer.

RUN curl https://example.com/example.tar.gz | tar xz -C temp \
 && apt-get update \
 && apt-get install -y some-app \
 && some-app temp \
 && apt-get remove -y some-app \
 && apt-get clean \
 && rm -rf temp

Wrong path to ngseasy_resources in ngseasy_trimmomatic script...

Hi Steve,

It seems you've quite recently changed the path to the references directory. So now it should be /ngs_projects/ngseasy_resources. But in some scripts I've noticed that another path is used.

Here are the details:

In ngseasy_trimmomatic (https://github.com/KHP-Informatics/ngseasy/blob/master/bin/ngseasy_trimmomatic) there is a code (line 467):
...
myresources=dirname ${PROJECT_DIR}
NGSResources="${myresources}/ngs_resources"
...
which says that I should have ngs_resources dir in the same level as ngs_projects (where PROJECT_DIR=/home/bob/ngs_projects)

and there is a corresponding mapping for docker volumes (line 477):
...
-v ${PROJECT_DIR}:/home/pipeman/ngs_projects
-v ${NGSResources}:/home/pipeman/ngs_resources
...
however one of the parameters to the same docker run command (line 486):

ILLUMINACLIP:${adapter_fa}:2:30:10:5:true \

contains a variable adpater_fa ('/home/pipeman/ngs_projects/ngseasy_resources/reference_genomes_b37/contaminant_list.fa') which is an another path to reference genome...

Thank you,
Olga

Re Dev : To Do

New Branch in GIT repn

  • make a new branch

f1000_dev
on image
/home/ubuntu/scratch/ngseasy

Openstack VM

  • space
  • send key to amos
  • 30+ CPU
  • max RAM
  • Volume : 4TB

Images

  • build images
  • build tool set
  • build one image with all tools

Get Genomes

  • hg19.fasta
  • hs37d5.fasta
  • GRCh38.p7.fasta
  • hs38DH.fasta
  • gatk resources bundles
17.05.2016
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000001405.22_GRCh38.p7/GCA_000001405.22_GRCh38.p7_genomic.fna.gz

Get test data

  • small 30-150x data set

Index Genomes

  • bwa
    • hg19.fasta
    • hs37d5.fasta
    • hs38DH.fasta
  • snap
    • hg19.fasta
    • hs37d5.fasta
    • hs38DH.fasta
  • novoalign
    • hg19.fasta
    • hs37d5.fasta
    • hs38DH.fasta
  • bowtie2
    • hg19.fasta
    • hs37d5.fasta
    • hs38DH.fasta

bwa

├── hs37d5.fasta
├── hs37d5.fasta.amb
├── hs37d5.fasta.ann
├── hs37d5.fasta.bwt
├── hs37d5.fasta.pac
├── hs37d5.fasta.sa

PLAN BY MONDAY 23rd

giab_data_indexes

https://github.com/genome-in-a-bottle/giab_data_indexes

Test Data

  • 30x Exome
  • 150x Exome
  • 1x WGX at 30x min. (source better WGS data set as X10 is shit and messy)

GATK Gold Standard Run

  • run bwa-realing-bsqr-haplotypecaller on all 3 data sets

This is the "Gold Standard". This will a week if no bugs.

The Glue

Open :-

  1. BASH done better than before
  • logging
  • read a user supplied config file (spreadsheet like)
  • user specifies the pipeline
  • SJN TO ADD CONFIG PARAMETER LIST
  • consider converting to .yaml behind the scenes
  • self checks : does input exist move on

RECON BY MONDAY NEXT WEEK

ngseasy logs an error when there is no ~/ngseasy_tmp directory

The error is:

/nfs/ngseasy/ngseasy/bin/ngseasy_initiate_project: line 223: /home/ansible/ngseasy_tmp/Project_list_1.0-r001.ngseasy_test1.config.tsv.261015: No such file or directory

the full log is:

$ ngseasy -c ngseasy_test1.config.tsv -d ~/ngs_projects


###########################################################################################
#
# Program: ngseasy
# Version 1.0-r001
# Authors: Stephen Newhouse ([email protected]); Amos Folarin ([email protected])
#
# Copyright (C) 2015  Stephen Jeffrey Newhouse and Amos Folarin
# NGSeasy (aka ngseasy) Version 1.0-r001 comes with ABSOLUTELY NO WARRANTY;
# for details see the GNU General Public License.
# This is free software, and you are welcome to redistribute it under certain conditions;
# see the GNU General Public License for details.
#
###########################################################################################


Docker Installed ok
Checking Docker Version and Info


----
Docker Version
----
Client version: 1.7.1
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 786b29d
OS/Arch (client): linux/amd64
Server version: 1.7.1
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 786b29d
OS/Arch (server): linux/amd64
----
Docker Info
----
Containers: 31
Images: 49
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 111
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-48-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 8
Total Memory: 31.42 GiB
Name: exp-02
ID: SELY:3VB2:UG7J:IV5P:VK2F:CTYM:IW3L:RTSS:HSVD:XITZ:ZK6D:SASD
----

[Mon Oct 26 11:19:04 GMT 2015]:[NGSEASY:1.0-r001]:[Log:START]:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
CONFIG FILE [-c] = ngseasy_test1.config.tsv
PROJECT DIR [-d] = /home/ansible/ngs_projects
[Mon Oct 26 11:19:05 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy]:Log file [/home/ansible/ngs_projects/run_logs/ngseasy.1.0-r001.ngseasy_test1.config.tsv.261015.log]:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:05 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy]:Check if project dir [/home/ansible/ngs_projects] exists:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:05 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy]:Current working directory [/home/ansible/ngs_projects/config_files]:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:05 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy]:Config file set as [ngseasy_test1.config.tsv]:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:06 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy]:Path to config file directory detected as [/home/ansible/ngs_projects/config_files]:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:06 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy]:Setting Path to config file:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:06 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy]:Config file location set to [/home/ansible/ngs_projects/config_files]:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:07 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy]:Moving to Config file location [/home/ansible/ngs_projects/config_files] to make life easy:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:07 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy]:Config file name [ngseasy_test1.config.tsv] detected:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:07 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy]:Reading [ngseasy_test1.config.tsv] :[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:07 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy]:Number of fields detected [23] :[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:07 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy]:Numcols of [ngseasy_test1.config.tsv] ok:[23]:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:07 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy]:Calling ngseasy_initiate_project:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]


###########################################################################################
#
# Program: ngseasy_initiate_project
# Version 1.0-r001
# Authors: Stephen Newhouse ([email protected]); Amos Folarin ([email protected])
#
# Copyright (C) 2015  Stephen Jeffrey Newhouse and Amos Folarin
# NGSeasy (aka ngseasy) Version 1.0-r001 comes with ABSOLUTELY NO WARRANTY;
# for details see the GNU General Public License.
# This is free software, and you are welcome to redistribute it under certain conditions;
# see the GNU General Public License for details.
#
###########################################################################################


[Mon Oct 26 11:19:07 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy_initiate_project]:Check if project dir [/home/ansible/ngs_projects] exists:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:07 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy_initiate_project]:Log file [/home/ansible/ngs_projects/run_logs/ngseasy.1.0-r001.ngseasy_test1.config.tsv.261015.log]:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:07 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy_initiate_project]:Current working directory [/home/ansible/ngs_projects/config_files]:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:07 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy_initiate_project]:Config file set as [ngseasy_test1.config.tsv]:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:08 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy_initiate_project]:Path to config file directory detected as [/home/ansible/ngs_projects/config_files]:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:08 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy_initiate_project]:Setting Path to config file:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:08 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy_initiate_project]:Config file location set to [/home/ansible/ngs_projects/config_files]:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:08 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy_initiate_project]:Moving to Config file location [/home/ansible/ngs_projects/config_files] to make life easy:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:08 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy_initiate_project]:Config file name [ngseasy_test1.config.tsv] detected:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
[Mon Oct 26 11:19:08 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy_initiate_project]:CMD: awk 'NR >1 {print /-c}' ngseasy_test1.config.tsv | sort | uniq > /home/ansible/ngseasy_tmp/Project_list:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
/nfs/ngseasy/ngseasy/bin/ngseasy_initiate_project: line 223: /home/ansible/ngseasy_tmp/Project_list_1.0-r001.ngseasy_test1.config.tsv.261015: No such file or directory
[Mon Oct 26 11:19:08 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy]:Calling ngseasy_initiate_fastq:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
...

Testing Network Speed : speedtest-cli

I recommend the speedtest-cli tool for this. I created a blog post (Measure Internet Connection Speed from the Linux Command Line) that goes into detail of downloading, installing and usage of it.

The short version is this:

$ wget -O speedtest-cli https://raw.github.com/sivel/speedtest-cli/master/speedtest_cli.py
$ chmod +x speedtest-cli
$ ./speedtest-cli
Retrieving speedtest.net configuration...
Retrieving speedtest.net server list...
Testing from Comcast Cable (x.x.x.x)...
Selecting best server based on ping...
Hosted by FiberCloud, Inc (Seattle, WA) [12.03 km]: 44.028 ms
Testing download speed........................................
Download: 32.29 Mbit/s
Testing upload speed..................................................
Upload: 5.18 Mbit/s

GeL Env Pipeline Test

  • Using internal resources and applications only carry out a multi-work unit test
  • Identify appropriate Resources (data)
  • Identify appropriate 3 Application chain
  • raw data R1_fastq R2_fastq
  • where is the reference genome
  • fastqc > trimommatc > bwa > freebayes
  • inputs R1/2_fastq , REF Genome and Ref genome bwa index
  • Can add resources but can't find Resource Basket
    -- In the destination project: Resource Basket in top right menu item
    -- Select all then, Create the dataset from the Resource Basket
  • We have two projects in List All. i) test-die and ii) Reference Genomes. How do we get fastq data? How do we get added to projects
  • How do we get data in (e.g. Create Dataset)
  • Went to Work Unit>>List All --> Universal Importer (adds to Resource Basket).. then create dataset project out of this to add
    -- We had to guess this!

ngseasy requires user to create an empty run_logs folder in the ngs_projects

Here is the log:

$ ngseasy -c ngseasy_test1.config.tsv -d ~/ngs_projects


###########################################################################################
#
# Program: ngseasy
# Version 1.0-r001
# Authors: Stephen Newhouse ([email protected]); Amos Folarin ([email protected])
#
# Copyright (C) 2015  Stephen Jeffrey Newhouse and Amos Folarin
# NGSeasy (aka ngseasy) Version 1.0-r001 comes with ABSOLUTELY NO WARRANTY;
# for details see the GNU General Public License.
# This is free software, and you are welcome to redistribute it under certain conditions;
# see the GNU General Public License for details.
#
###########################################################################################


Docker Installed ok
Checking Docker Version and Info


----
Docker Version
----
Client version: 1.7.1
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 786b29d
OS/Arch (client): linux/amd64
Server version: 1.7.1
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 786b29d
OS/Arch (server): linux/amd64
----
Docker Info
----
Containers: 31
Images: 49
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 111
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-48-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 8
Total Memory: 31.42 GiB
Name: exp-02
ID: SELY:3VB2:UG7J:IV5P:VK2F:CTYM:IW3L:RTSS:HSVD:XITZ:ZK6D:SASD
----

[Mon Oct 26 10:42:45 GMT 2015]:[NGSEASY:1.0-r001]:[Log:START]:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
CONFIG FILE [-c] = ngseasy_test1.config.tsv
PROJECT DIR [-d] = /home/ansible/ngs_projects
[Mon Oct 26 10:42:46 GMT 2015]:[NGSEASY:1.0-r001]:[ngseasy]:Making log file [/home/ansible/ngs_projects/run_logs/ngseasy.1.0-r001.ngseasy_test1.config.tsv.261015.log]:[ansible]:[Linux exp-02 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]
touch: cannot touch ‘/home/ansible/ngs_projects/run_logs/ngseasy.1.0-r001.ngseasy_test1.config.tsv.261015.log’: No such file or directory

to doz

Scratch pad

hs38 primary assembly of GRCh38 (incl. chromosomes, unplaced and unlocalized contigs) and EBV
hs38a hs38 plus ALT contigs
hs38DH hs38a plus decoy contigs and HLA genes (recommended for GRCh38 mapping)
hs37 primary assembly of GRCh37 (used by 1000g phase 1) plus the EBV genome
hs37d5 hs37 plus decoy contigs (used by 1000g phase 3)

For 1.0 to 1.X

ploidy options - freebayes
SEX chrom calling XX XY Y
lcr regions options
config file reading in fixes
config file order of options
platypus options testing and tweaks
test freebayes options - regions are mapped reads callable loci or just Chromsomes
Improve logging
clean up install
biobambam

Future Dev

b38 indexes - GIAB
b38 pipelines - GIAB
gui - user install updates git AWS gce pricing registration for GATK and others
capture bed files from companies
recalling pipeline
cohort pipeline
cancer pipeline
CNV pipeline
Annotation pipeline
PGRS pipeline plus reporting
chanjo
bcbio options : give user option for bcbio or speedseq
test speedseq
need parser to create options for calling bcbio or speedseq

Browsers

http://genomesavant.com/p/savant/index

Installation error 8

This is the error i have been receiving while installation. Can you please help me solve it?

Connecting to s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)|52.218.88.19|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-07-18 16:23:07 ERROR 404: Not Found.

Makefile:365: recipe for target 'testdata' failed
make: *** [testdata] Error 8

This is the error i have been receiving while installation. Can you please help me solve it?

Installing ngseasy scripts to system...
chmod 775 ./bin/* &&
mkdir /home/root/bin &&
cp -rv ./bin/* /home/root/bin/
mkdir: cannot create directory ‘/home/root/bin’: No such file or directory
Makefile:56: recipe for target 'install' failed
make: *** [install] Error 1

Date of Download: 18/07/2020
OS and version: Linux Mint 18.3 Sylvia
Docker Version: 18.09.07
The Code you ran: make INSTALLDIR="/home/wubuntu" all

Still developing and using this?

Hi
This repo looks interesting to me. However it does not seem to be very active. And the README line makes me hesitate:

Undergoing some re-dev...things may be broken...

Are there some dramatic changes? And should I use the current release?
And 1-year release cycle is too long for me. I would prefer half-year or quarterly release cycle :)

Bash script errors

Hi,
I'm Natalja from EMBL-EBI. We are trying to run NGSeasy pipeline on our Embassy Cloud and I found someissues. Would like to notice that I'm not a bash script expert, so may be I just misunderstood smth or there are differences in scripts execution on different linux distributions.
We are using CentOS as NFS server where NGSeasy scripts are stored and some of them are running (project initiation scripts) and Ubuntu 14.04 for docker machines.

Here is my list:

  1. variable VERSION is missed error almost in all scripts. I've fixed it by renaming VERSION to NGSEASYVERSION (from config file)
  2. function logger_ngseasy() in some of the scripts was causing an error because it is called from inside the script(s) with only one argument passed (at least as I understood). So I've changed it by adding default value for the second argument "mylogfile".
  3. I've commented out in all scripts the following line:
    #logger_ngseasy "[${NGSEASY_STEP}]:CMD: awk 'NR >1 {print $5"/"$1}' ${config_tsv} | sort | uniq > ${HOME}/ngseasy_tmp/Project_list" ${config_run_log}
    since it was causing errors.
  4. not sure that it is an issue, but in order to run scripts from NFS server I've changed the call of the scripts from /bin/bash <script name> to bash /<script name>. For example, /bin/bash ngseasy_initiate_project ... was changed to bash /nfs/ngseasy/bin/ngseasy_initiate_project
  5. ngseasy_initiate_project: in my case during reading of file -f argument is not available, so I've changed:
    "while read -f PROJECTNAME" to "while read PROJECTNAME"
  6. ngseasy_fastqc: all lines similar to this one if [[ "${GENOMEBUILD}" -eq "b37" ]]; I've changed to if [[ "${GENOMEBUILD}" = "b37" ]]; since "=" is used for string comparison and "-eq" caused error.

I suppose that's it for now. Sorry if some of the issues are relevant only for our environment.

Also it could be nice to discuss with you parallelization questions. If I've got everything correctly then only fastqc jobs are running in parallel. All other steps go one by one. We are creating docker cluster so may be we can try to find effective solution for the alignment and other steps together with you?

-Natalja

test if dataset exists locally before downloaded

Makefile seems to re-download same datasets if run again.

  • test exists before download from Amazon

Also handle more gracefully where a problem arises (e.g. a timeout or some such transient issue)

NGSeasy Calling issue

Hello

I am a beginner in programing
I just installed the tool successfully and I even realize the test and observe the VCF output
But after closing the terminal I wanted to come back and start a to run with my own data and then I receive an error that I do not understand

Calling ngseasy_trimmomatic
/ Bin / bash: ngseasy_trimmomatic: No such file or directory
Calling ngseasy_alignment
/ Bin / bash: ngseasy_alignment: No such file or directory
Calling ngseasy_realn
/ Bin / bash: ngseasy_realn: No such file or directory
Calling ngseasy_bsqr
/ Bin / bash: ngseasy_bsqr: No such file or directory
Calling ngseasy_variant_calling
/ Bin / bash: ngseasy_variant_calling: No such file or directory

can you please Help me

Best

ERROR 403 when downloading hg19 file

I activated an EC2 machine and cloned ngeasy on it. The make all step worked perfectly, but it stopped in the hg19 rule. It seems a permission problem of the target amazon files.

2015-04-24 09:32:15 (74.8 MB/s) - ‘nexterarapidcapture_exome_targetedregions_v1.2.bed’ saved [5116156/5116156]

--2015-04-24 09:32:15--  https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_b37/nexterarapidcapture_expandedexome_targetedregions.bed
Resolving s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)... 54.231.136.244
Connecting to s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)|54.231.136.244|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10907040 (10M) [binary/octet-stream]
Saving to: ‘nexterarapidcapture_expandedexome_targetedregions.bed’

100%[===========================================================================================================================================>] 10,907,040  63.8MB/s   in 0.2s   

2015-04-24 09:32:15 (63.8 MB/s) - ‘nexterarapidcapture_expandedexome_targetedregions.bed’ saved [10907040/10907040]

cd /home/ubuntu/ngs_projects && \
    mkdir reference_genomes_hg19 && \
    cd /home/ubuntu/ngs_projects/reference_genomes_hg19 && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/1000G_omni2.5.hg19.sites.vcf && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/1000G_omni2.5.hg19.sites.vcf.idx && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/1000G_phase1.indels.hg19.sites.vcf && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/1000G_phase1.indels.hg19.sites.vcf.idx && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.idx && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/CEUTrio.HiSeq.WGS.b37.bestPractices.hg19.vcf && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/CEUTrio.HiSeq.WGS.b37.bestPractices.hg19.vcf.idx && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/Genome && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/GenomeIndex && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/GenomeIndexHash && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz.tbi && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.idx && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/NA12878.HiSeq.WGS.bwa.cleaned.raw.subset.hg19.sites.vcf && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/NA12878.HiSeq.WGS.bwa.cleaned.raw.subset.hg19.sites.vcf.idx && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/NA12878.HiSeq.WGS.bwa.cleaned.raw.subset.hg19.vcf && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/NA12878.HiSeq.WGS.bwa.cleaned.raw.subset.hg19.vcf.idx && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/NA12878.knowledgebase.snapshot.20131119.hg19.vcf && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/NA12878.knowledgebase.snapshot.20131119.hg19.vcf.idx && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/OverflowTable && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/dbsnp_138.hg19.excluding_sites_after_129.vcf.idx && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/dbsnp_138.hg19.recab && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/dbsnp_138.hg19.vcf && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/dbsnp_138.hg19.vcf.idx && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/get_hg19.sh && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/get_hg19_others.sh && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/hapmap_3.3.hg19.sites.vcf && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/hapmap_3.3.hg19.sites.vcf.idx && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/hg19.genome && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/index_bowtie.sh && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/index_bwa.sh && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/index_novo.sh && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/index_snap.sh && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/index_stampy.sh && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/nohup.out && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19-bs.umfa && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.1.bt2 && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.2.bt2 && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.3.bt2 && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.4.bt2 && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.dict && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.fai && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.fasta && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.fasta.amb && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.fasta.ann && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.fasta.bwt && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.fasta.fai && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.fasta.fai.gz && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.fasta.gz && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.fasta.novoindex && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.fasta.pac && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.fasta.sa && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.rev.1.bt2 && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.rev.2.bt2 && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.sthash && \
    wget https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/ucsc.hg19.stidx && \
    chmod -R 777 /home/ubuntu/ngs_projects/reference_genomes_hg19/
--2015-04-24 09:32:15--  https://s3-eu-west-1.amazonaws.com/ngseasy.data/reference_genomes_hg19/
Resolving s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)... 54.231.136.244
Connecting to s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)|54.231.136.244|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2015-04-24 09:32:15 ERROR 403: Forbidden.

About downloading time and running errors

Hi,

In readme, it says:

Install can take a while, 1-2 hours, so go get a coffee
just chill...
if your network is bad...then who knows how long...
still..just chill...

But the command

make INTSALLDIR="/home/ec2-user" all

takes more than a week to download. The downloading speed for me is about 3MB/s, is that normal?

And another question, when I tried to run

ngseasy -c ngseasy_test.config.tsv -d /home/ec2-user/ngs_projects

I got some errors in run log as:

Exception in thread "main" java.io.FileNotFoundException: /home/pipeman/ngs_projects/Test_NGS/NA12878/fastq/illumina.100bp.pe.wex.30x_1.fastq.gz (Permission denied)^M
at java.io.FileInputStream.open(Native Method)^M
at java.io.FileInputStream.(FileInputStream.java:146)^M
at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:127)^M
at org.usadellab.trimmomatic.TrimmomaticPE.process(TrimmomaticPE.java:251)^M
at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:498)^M
at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:35)

I run the command with sudo so I do not know why there is permission denied exception.
The file list in the exception is in /home/root/*, is that the problem?

And at the end of run log, it says:

[Sat Aug 8 22:03:07 UTC 2015]:[NGSEASY:1.0-r001]:[ngseasy_variant_calling]:ERROR:Can not find required BAM File for Variant Calling:[root]:[Linux ip-172-31-23-78 3.13.0-61-generic #100-Ubuntu SMP Wed Jul 29 11:21:34 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux]

Thank you very much!

Defaults to placing log and tmp dirs in home dir

ngsprojectdir:
@echo "Make Top level project directories"
mkdir -v -p $(INSTALLDIR)/ngs_projects &&
mkdir -v -p $(INSTALLDIR)/ngs_projects/raw_fastq &&
mkdir -v -p $(INSTALLDIR)/ngs_projects/config_files &&
mkdir -v -p $(INSTALLDIR)/ngs_projects/run_logs &&
mkdir -v -p $(INSTALLDIR)/ngs_projects/ngseasy_resources &&
mkdir -v -p $(HOME)/ngseasy_logs &&
mkdir -v -p $(HOME)/ngseasy_tmp

should why put log and tmp in the home dir ?
mkdir -v -p $(HOME)/ngseasy_logs
mkdir -v -p $(HOME)/ngseasy_tmp

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.