Coder Social home page Coder Social logo

gtf-js's Introduction

@gmod/gtf

Build Status

GTF or the General Transfer Format is identical to GFF version2. This module was created to read and write GTF data. This module aims to be a complete implementation of the GTF specification.

  • streaming parsing and streaming formatting
  • creates transcript features with children_features
  • only compatible with GTF

Note: For JBrowse, we generally encourage GFF3 over GTF

For GFF3, checkout @gmod/gff-js package found here

Install

$ npm install --save @gmod/gtf

Usage

import gtf from '@gmod/gtf'

// parse a file from a file name
gtf.parseFile('path/to/my/file.gtf', { parseAll: true })
.on('data', data => {
  if (data.directive) {
    console.log('got a directive',data)
  }
  else if (data.comment) {
    console.log('got a comment',data)
  }
  else if (data.sequence) {
    console.log('got a sequence from a FASTA section')
  }
  else {
    console.log('got a feature',data)
  }
})

// parse a stream of GTF text
const fs = require('fs')
fs.createReadStream('path/to/my/file.gtf')
.pipe(gtf.parseStream())
.on('data', data => {
  console.log('got item',data)
  return data
})
.on('end', () => {
  console.log('done parsing!')
})

// parse a string of gtf synchronously
let stringOfGTF = fs
  .readFileSync('my_annotations.gtf')
  .toString()
let arrayOfThings = gtf.parseStringSync(stringOfGTF)

// format an array of items to a string
let stringOfGTF = gtf.formatSync(arrayOfThings)

// format a stream of things to a stream of text.
// inserts sync marks automatically.
// note: this could create new gtf lines for transcript features
myStreamOfGTFObjects
.pipe(gtf.formatStream())
.pipe(fs.createWriteStream('my_new.gtf'))

// format a stream of things and write it to
// a gtf file. inserts sync marks
//  note: this could create new gtf lines for transcript features
myStreamOfGTFObjects
.pipe(gtf.formatFile('path/to/destination.gtf')

Object format

features

Because GTF can not handle a 3 level hierarchy (gene -> transcript -> exon), we parse GTF by creating transcript features with children features.

We do not create features from the gene_id. Values that are . in the GTF are null in the output.

ctgA	bare_predicted	CDS	10000	11500	.	+	0	transcript_id "Apple1";

Note: that is creates an additional transcript feature from the transcript id when featureType is not 'transcript'. It will then create a child CDS feature from the line of GTF shown above.

[
    [
        {
            "seq_name": "ctgA",
            "source": "bare_predicted",
            "featureType": "transcript",
            "start": 10000,
            "end": 11500,
            "score": null,
            "strand": "+",
            "frame": "0",
            "attributes": { "transcript_id": [ "\"Apple1\"" ] },
            "child_features": [[
                {
                    "seq_name": "ctgA",
                    "source": "bare_predicted",
                    "featureType": "CDS",
                    "start": 10000,
                    "end": 11500,
                    "score": null,
                    "strand": "+",
                    "frame": "0",
                    "attributes": { "transcript_id": [ "\"Apple1\"" ] },
                    "child_features": [],
                    "derived_features": []
                }
            ]],
            "derived_features": []
        }
    ]
]

directives, comments, sequences

parseDirective("##gtf\n")
// returns
{
  "directive": "gtf",
}

parseComment('# hi this is a comment\n')
// returns
{
  "comment": "hi this is a comment"
}

//These come from any embedded `##FASTA` section in the GTF file.
{
  "id": "ctgA",
  "description": "test contig",
  "sequence": "ACTGACTAGCTAGCATCAGCGTCGTAGCTATTATATTACGGTAGCCA"
}

API

Table of Contents

parseStream

Parse a stream of text data into a stream of feature, directive, and comment objects.

Parameters

  • options Object optional options object (optional, default {})

    • options.encoding string text encoding of the input GTF. default 'utf8'
    • options.parseAll boolean default false. if true, will parse all items. overrides other flags
    • options.parseFeatures boolean default true
    • options.parseDirectives boolean default false
    • options.parseComments boolean default false
    • options.parseSequences boolean default true
    • options.bufferSize Number maximum number of GTF lines to buffer. defaults to 1000

Returns ReadableStream stream (in objectMode) of parsed items

parseFile

Read and parse a GTF file from the filesystem.

Parameters

  • filename string the filename of the file to parse

  • options Object optional options object

    • options.encoding string the file's string encoding, defaults to 'utf8'
    • options.parseAll boolean default false. if true, will parse all items. overrides other flags
    • options.parseFeatures boolean default true
    • options.parseDirectives boolean default false
    • options.parseComments boolean default false
    • options.parseSequences boolean default true
    • options.bufferSize Number maximum number of GTF lines to buffer. defaults to 1000

Returns ReadableStream stream (in objectMode) of parsed items

parseStringSync

Synchronously parse a string containing GTF and return an arrayref of the parsed items.

Parameters

  • str string

  • inputOptions Object optional options object (optional, default {})

    • inputOptions.parseAll boolean default false. if true, will parse all items. overrides other flags
    • inputOptions.parseFeatures boolean default true
    • inputOptions.parseDirectives boolean default false
    • inputOptions.parseComments boolean default false
    • inputOptions.parseSequences boolean default true

Returns Array array of parsed features, directives, and/or comments

formatSync

Format an array of GTF items (features,directives,comments) into string of GTF. Does not insert synchronization (###) marks. Does not insert directive if it's not already there.

Parameters

  • items

Returns String the formatted GTF

formatStream

Format a stream of items (of the type produced by this script) into a stream of GTF text.

Inserts synchronization (###) marks automatically.

Parameters

  • options Object

    • options.minSyncLines Object minimum number of lines between ### marks. default 100
    • options.insertVersionDirective Boolean if the first item in the stream is not a ##gff-version directive, insert one to show it's gtf default false

formatFile

Format a stream of items (of the type produced by this script) into a GTF file and write it to the filesystem.

Inserts synchronization (###) marks and a ##gtf directive automatically (if one is not already present).

Parameters

  • stream ReadableStream the stream to write to the file

  • filename String the file path to write to

  • options Object (optional, default {})

    • options.encoding String default 'utf8'. encoding for the written file
    • options.minSyncLines Number minimum number of lines between sync (###) marks. default 100
    • options.insertVersionDirective Boolean if the first item in the stream is not a ##gtf directive, insert one. default false

Returns Promise promise for the written filename

util

Table of Contents

util

unescape

Unescape a string/text value used in a GTF attribute. Textual attributes should be surrounded by double quotes source info: https://mblab.wustl.edu/GTF22.html https://en.wikipedia.org/wiki/Gene_transfer_format

Parameters

Returns String

_escape

Escape a value for use in a GTF attribute value.

Parameters

Returns String

escapeColumn

Escape a value for use in a GTF column value.

Parameters

Returns String

parseAttributes

Parse the 9th column (attributes) of a GTF feature line.

Parameters

Returns Object

parseFeature

Parse a GTF feature line.

Parameters

  • line String returns the parsed line in an object

parseDirective

Parse a GTF directive/comment line.

Parameters

Returns Object the information in the directive

formatAttributes

Format an attributes object into a string suitable for the 9th column of GTF.

Parameters

formatFeature

Format a feature object or array of feature objects into one or more lines of GTF.

Parameters

  • featureOrFeatures

formatDirective

Format a directive into a line of GTF.

Parameters

Returns String

formatComment

Format a comment into a GTF comment. Yes I know this is just adding a # and a newline.

Parameters

Returns String

formatSequence

Format a sequence object as FASTA

Parameters

Returns String formatted single FASTA sequence

formatItem

Format a directive, comment, or feature, or array of such items, into one or more lines of GTF.

Parameters

Notes and resources

License

MIT © Robert Buels

gtf-js's People

Contributors

rbuels avatar cmdcolin avatar teresam856 avatar dependabot[bot] avatar greenkeeper[bot] avatar

Stargazers

Michael Law avatar Takuya Fukuju avatar

Watchers

Ian Holmes avatar  avatar Todd H. avatar Lincoln Stein avatar Naama Menda avatar Scott Cain avatar Alex Kalderimis avatar Nathan Dunn avatar James Cloos avatar Junjun Zhang avatar Suzanna Lewis avatar Stephen Ficklin avatar Bill Riehl avatar Matt Henderson avatar Andrew Duncan avatar  avatar Elliot Hershberg avatar Garrett Stevens avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.