Coder Social home page Coder Social logo

wholtz / biopython-convert Goto Github PK

View Code? Open in Web Editor NEW

This project forked from brinkmanlab/biopython-convert

0.0 0.0 0.0 17.98 MB

Tool to interconvert between various bioinformatics formats that BioPython supports

License: Other

Shell 0.01% Python 3.26% Roff 96.73%

biopython-convert's Introduction

BioPython-Convert

Interconvert various file formats supported by BioPython.

Supports querying records with JMESPath.

Installation

pip install biopython-convert

or:

conda install biopython-convert

or:

git clone https://github.com/brinkmanlab/BioPython-Convert.git
cd BioPython-Convert
./setup.py install

Use

biopython.convert [-s] [-v] [-i] [-q JMESPath] input_file input_type output_file output_type
    -s Split records into seperate files
    -q JMESPath to select records. Must return list of SeqIO records or mappings. Root is list of input SeqIO records.
    -i Print out details of records during conversion
    -v Print version and exit
Supported formats
abi, abi-trim, ace, cif-atom, cif-seqres, clustal, embl, fasta, fasta-2line, fastq-sanger, fastq, fastq-solexa, fastq-illumina, genbank, gb, ig, imgt, nexus, pdb-seqres, pdb-atom, phd, phylip, pir, seqxml, sff, sff-trim, stockholm, swiss, tab, qual, uniprot-xml, gff3, txt, json, yaml

The root node for a query is a list of SeqRecord objects. The query can return a list with a subset of these or a mapping, keying to the constructor parameters of a SeqRecord object.

If the formats are txt, json, or yaml, then the JMESPath resulting object will simply be dumped in those formats.

A web based tool is available to experiment with constructing queries in real time on your data. Simply convert your dataset to JSON and load it into the JMESPath playground to begin composing your query. It supports loading JSON files directly rather than trying to copy/paste the data.

split() and let() functions are available in addition to the JMESPath standard functions

extract(Seq, SeqFeature) is also made available to allow access to the SeqFeature.extract() function within the query

Examples:

Append a new record:

[@, [{'seq': 'AAAA', 'name': 'my_new_record'}]] | []

Filter out any plasmids:

[?!(features[?type=='source'].qualifiers.plasmid)]

Keep only the first record:

[0]

Output taxonomy of each record (txt output):

[*].annotations.taxonomy

Output json object containing id and molecule type:

[*].{id: id, type: annotations.molecule_type}

Convert dataset to PTT format using text output:

[0].[join(' - 1..', [description, to_string(length(seq))]), join(' ', [to_string(length(features[?type=='CDS' && qualifiers.translation])), 'proteins']), join(`"\t"`, ['Location', 'Strand', 'Length', 'PID', 'Gene', 'Synonym', 'Code', 'COG', 'Product']), (features[?type=='CDS' && qualifiers.translation].[join('..', [to_string(sum([location.start, `1`])), to_string(location.end)]), [location.strand][?@==`1`] && '+' || '-', length(qualifiers.translation[0]), (qualifiers.db_xref[?starts_with(@, 'GI')].split(':', @)[1])[0] || '-', qualifiers.gene[0] || '-', qualifiers.locus_tag[0] || '-', '-', '-', qualifiers.product[0] ] | [*].join(`"\t"`, [*].to_string(@)) )] | []

Convert dataset to faa format using fasta output:

[0].let({org: (annotations.organism || annotations.source)}, &(features[?type=='CDS' && qualifiers.translation].{id:
join('|', [
        (qualifiers.db_xref[?starts_with(@, 'GI')].['gi', split(':', @)[1]]),
        (qualifiers.protein_id[*].['ref', @]),
        (qualifiers.locus_tag[*].['locus', @]),
        join('', [':', [location][?strand==`-1`] && 'c' || '', to_string(sum([location.start, `1`])), '..', to_string(location.end)])
][][]),
seq: qualifiers.translation[0],
description: (org && join('', [qualifiers.product[0], ' [', org, ']']) || qualifiers.product[0])}))

See CONTRIBUTING.rst for information on contributing to this repo.

biopython-convert's People

Contributors

innovate-invent avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.