Coder Social home page Coder Social logo

bioinfo's Introduction

Bioinformatics tools

More details are provided within each subfolder.

pileVar.pl identifies variants based on mpileup output from samtools. This tool is currently geared toward identifying fixed differences and regions of ambiguity due to indels.

Plot simple schematic of annotation features described in a GFF file.

Plot spatially-distributed pie charts.

A handful of bioinformatics scripts:

fastaSort sorts a file of Fasta sequences by identifier name, naturally, so that ĂŹd_1 is followed by id_2, rather than id_10. Requires BioPerl and the Perl module Sort::Naturally.

gffSort is a small Bash script that sorts a GFF file first by sequence name in column 1, and then by position numerically in column 4. It assumes you might have generated this GFF with MAKER, so it removes the ### lines between gene models if they are present.

stacksExtractStats.pl reads a Stacks log file and produce a table summarizing sample-specific statistics. It currently prints number of RAD-Tags, number of stacks and mean stack coverage, and it is written so that other statistics can easily be harvested from the output.

mummer2Vcf.pl reads file of SNPs and indels called by the Mummer program show-snps -T and produce a VCF-ish file, collapsing consecutive indel characters into a single indel and adding the missing first base from indels by reading from the reference sequence file. The format produced is not yet compliant VCF yet but it will be. Requires BioPerl.

subsampleReads.pl randomly selects a fraction of FastQ-format reads.

phredDectector.pl attempts to determine the Phred-scale quality encoding used in a FastQ-format file. If everything looks reasonable, it simply prints 33, 64 or 59 (this last for older Solexa sequences) to stdout. Does not require BioPerl.

fermiExtractContigs.pl creates Fasta-format contigs from a fermi-format *.fq.gz FastQ-like scaftig files. Writes Fasta to stdout, giving each Fasta sequence its fermi sequence name and a description that includes sequence length, number of non-redundant reads that built the scaftig, and median coverage of non-redundant reads along the scafftig. Requires BioPerl.

fermiExtractContigs_simple.sh creates Fasta-format contigs from a fermi-format *.fq.gz FastQ-like scaftig files. Unlike fermiExtractContigs.pl, this does not use Perl. Writes Fasta to stdout, giving each Fasta sequence its fermi sequence name and a description that includes sequence length and the number of non-redundant reads that built the scaftig.

windowWig reads a data stream (for example, coverage values by position within reference sequences) and produces a USCS WIG file that summarizes median values within nonoverlapping windows.

intervalBed reads a data stream with reference-position optionally marked with boolean values (for example, presence-absence by position within reference sequences) and produces a BED file describing intervals in which the values are true.

samHeader2Bed.pl reads a SAM header and produces BED file(s) after applying a few filtering criteria.

pileup2pro.pl reads samtools mpileup format and produces a profile file suitable for input to mlRho.

mergePileupColumns merges columns from each BAM in multi-BAM samtools mpileup output into single columns.

extractFastaSeqs.pl extracts named sequences from a FASTA file, or everything but. Does not use BioPerl.

extractFasta.pl extracts named FASTA sequences from a makeblastdb -parse_seqids-built blast database, and optionally provide a range for a subsequence. Uses BioPerl.

extractFasta extracts named FASTA sequences from a FASTA file indexed with BioPerl's Bio::DB::Fasta

trimFastq.pl hard-trims a given amount from the 5' or 3' end (or both) from each read in a FastQ-format file, and optionally trims each read from the 3' end to a maximum length.

shuffleFastq.pl and deshuffleFastq.pl convert FastQ-format files from separate read 1/read 2 files to interleaved and back. These are based on similar scripts provided with velvet.

fastaGC.pl analyses GC content of Fasta-format sequences a few different ways.

convertSequence.pl and convertAlignment.pl convert between sequence and alignment formats using BioPerl. convertAlignment.pl can also convert aligned sequences to degapped unaligned sequences. If you need to line-wrap a Fasta file, use

convertSequence.pl -f fasta file.fa -of fasta - > outfile.fa

gmhmmp2Fasta.pl and gmhmmp2Table.pl extract Fasta sequences and a summary table from output produced by the ORF-finding tool gmhmmp.

cutadaptReportScript.sh collects results from *.cutReport files produced by cutadapt to quicklyl produce a table of adapter trimming results.

bioinfo's People

Contributors

douglasgscofield avatar

Watchers

James Cloos avatar linsi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.