Coder Social home page Coder Social logo

biocommons / uta Goto Github PK

View Code? Open in Web Editor NEW
62.0 11.0 26.0 10.63 MB

Universal Transcript Archive: comprehensive genome-transcript alignments; multiple transcript sources, versions, and alignment methods; available as a docker image

License: Apache License 2.0

Makefile 2.56% Python 81.82% Perl 11.31% Shell 1.59% DIGITAL Command Language 0.01% PLpgSQL 1.72% Dockerfile 0.99%
bioinformatics sequences sequence-alignment

uta's People

Contributors

afrubin avatar ahwagner avatar andreasprlic avatar b0d0nne11 avatar holtgrewe avatar invitae-vince avatar korikuzma avatar ktennesseninvitae avatar reece avatar safay avatar zzwiesler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

uta's Issues

collect and load BIC transcripts

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/uta #37
Migrated by bitbucket-issue-migration on 2016-09-09 15:15:07


Collect and load exon structures for BIC transcripts that we'd like to be able to report on. If genome alignments are not available for these, this is out-of-scope (could be made in-scope with additional time).

Links

  • imported from: CORE-37 (Invitae access required)
  • is related to: issue #142

restructure UTA to make more like other projects

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/uta #116
Migrated by bitbucket-issue-migration on 2016-09-09 15:15:07


UTA grew organically over ~2 years. Its current structure reflects that chaotic growth and the diversity of tools that were included there. Some of those tools have been spun out into separate repos.

This ticket means: restructure the repo to be more like a typical python package, and especially like hgvs, bdi, and eutils for consistency. Remove kruft at the same time.

Links

  • imported from: CORE-116 (Invitae access required)

add and verify special-request transcripts

Originally reported by Geoffrey Nilsen (Bitbucket: gnilsen, GitHub: Unknown) in biocommons/uta #112
Migrated by bitbucket-issue-migration on 2016-09-09 15:15:07


I'm not sure what governs which transcripts are or are not in the UTA database (uta0 on uta.invitae.com), but I think all transcripts for our current panel should be in there.

NEK8 is missing (NM_178170.2) from uta0.transcript, uta0.transcript_exon, uta0.genomic_exon.

Links

  • imported from: CORE-112 (Invitae access required)

implement transcript comparison across sources

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/uta #38
Migrated by bitbucket-issue-migration on 2016-09-09 15:15:07


Extend current method of comparing NCBI and Ensembl transcripts to UCSC. We're after identifying these kinds of issues:

  • different # of exons
  • different exon lengths
  • different exon alignments to a reference genome

One way: fingerprint function that returns the same value for two transcripts if and only if the combination of <seq md5, cds_se_i, exons_se_i> is identical.
(that's full sequence md5, not cds)

Links

  • imported from: CORE-38 (Invitae access required)
  • parent task: issue #6

implement stable views for common uses (particularly hgvs)

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/uta #155
Migrated by bitbucket-issue-migration on 2016-09-09 15:15:07


The goal is a minimal api via views for the purposes of providing data to hgvs. At the same time, consider whether a schema overhaul is warranted.

areas:
gene info: gene, aliases, description
sequence info:
transcript: ac, cds_se, exons_se
alignment: tx_ac, alt_ac, strand, method, exons, bounds, cigars
aligned exons: tx_ac,alt_ac,strand,method,ord,cigar,sequences

How to handle multiple alignments of tx to alt sequence?
Rel: multiple alignments to multiple alts (e.g., PAR and paralogs)

Links

  • imported from: CORE-155 (Invitae access required)

add misc_feature support

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/uta #119
Migrated by bitbucket-issue-migration on 2016-09-09 15:15:07


Apparently some genes still require misc_feature support. They are identifiable from current ncbi.txinfo.gz files by having no exons.

The solution is to 1) modify eutils to fetch misc_features and 2) modify sbin/ncbi-fetch to try for exons first, then misc_features.

example: PECAM1

Links

  • imported from: CORE-119 (Invitae access required)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.