Coder Social home page Coder Social logo

surpirt-dist's Introduction

SURPIrt v1.0

Hardware & Software Requirements

• Linux server, tested Ubuntu 16.04 with 512 GB memory, 18 TB shared disk volume

Additional Software Dependencies

• Python interpreter, tested Python v2.7.12
• Perl interpreter, tested Perl v5.22.1

Required Scripts

Linux shell scripts
	• blast_ncores.sh [OBFUSCATED]
	• extractAlltoFast.sh [OBFUSCATED]
	• FastaToTab.csh [OBFUSCATED]
	• filterOverlapSAM.sh [OBFUSCATED] 
	• split_barcodes.sh [OBFUSCATED]
	• SURPIrt.sh [OBFUSCATED]
	• SURPIrt_viz.sh [OBFUSCATED]
	• TabToFasta.csh [OBFUSCTED]
	• taxonomy_annotation.sh [OBFUSCATED]

Python scripts
	• classify_annotated.py [BINARIZED]
	• mask_primers.py [BINARIZED]
	• parse_overlapping.py [BINARIZED]
	• trim_primers.py [BINARIZED]

Perl scripts
	• fasta_to_fastq.pl
	• subtractBlastFromFasta.pl

C/C++ executables
	• fqextract_5m

Instructions for Installing and Running the SURPIrt Software

  1. The reference databases used by SURPIrt for identification of human, bacterial, fungal, and parasitic reads and for taxonomy lookup are not provided in the Github distribution. They will need to be regenerated as follows:

    • The reference headers for the fasta reference database are provided in the /reference_headers subdirectory. Note that the reference headers can be in either gi or accession number format and also may include extraneous descriptive text. Use the reference headers to reconstruct the individual fasta files and place them in the directory structure as described in the README file.

    • The subdirectory /taxonomy_files contains the file CSV-formatted file lineages-2019-01-20.csv. Instructions for generating the 2nd taxonomy file nucl_all_sorted_LCall.txt are provided in the README file. These files will need to be placed in the $taxonomy_folder (default /reference/surpirt/taxonomy).

  2. Once the human/microbial reference and taxonomy lookup databases have been generated and placed in their appropriate directories, the pipeline can be run using the SURPIrt.sh script with the following command-line switches:

SURPIrt version 1.0

This program will run the SURPIrt pipeline.

Command Line Switches:

	-h	Show this help & ignore all other switches

	-r	Specify reference folder [optional - default: "/reference/surpirt"]

	-f 	Specify input FASTQ [required]

	-v	Execute pipeline in virus-only mode

		This is implemented for speed, if only looking for viruses.

	-w	Create files necessary for SURPIviz

	-x	Execute pipeline in verification-only mode.

		This mode will verify all database locations, but not execute the pipeline.

	-t	Specify number of threads to use [optional - will be set to number of cores if unspecified]

	-c	Specify config file [optional]

		This switch is used to initiate a SURPIrt run using a specified config file. Any parameters 
		in the config file will supersede default parameters within the pipeline.
		
		When using a config file, it is best to avoid using other command-line parameters. Instead, all
		parameters should be included with the config file.

	-z	Create default config file. [optional] (specify fastq filename)
		This option will create a standard .config file, and go file.

Test Run

  1. A sample test file named ZIKV-nohuman.fastq is provided, is a metagenomic run of a ZIKV clinical sample with the human reads removed [n=517 sequences].

  2. Using default reference directory of /reference/surpi, run the SURPIrt.sh script from the command line with the following parameters (using 8 threads/cores):

SURPIrt.sh -f "ZIKV-nohuman.fastq" -t 8

surpirt-dist's People

Contributors

sfederman avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.