Coder Social home page Coder Social logo

common_scripts's People

Contributors

aedawid avatar aseetharam avatar inversewander avatar isugif avatar remkv6 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

common_scripts's Issues

gff2fasta.pl: help with a poorly formatted gff file

Hello,
I understand that you are not the original author of gff2fasta.pl (nor is it clear from the biostars page), but I was wondering if I could have some help modifying the script to be compatible with a gff file I was given that is oddly formatted.

Here are two entries from the file:

7000000037415267        .       gene    21339   22504   .       +       .       ID=7000003035155523;Function=polygalacturonase%2C%20putative;Name=PITG_19619
7000000037415267        .       mRNA    21339   22504   .       +       .       ID=7000003035155526;Parent=7000003035155523;Function=polygalacturonase%2C%20putative;Name=PITG_19619
7000000037415267        .       exon    21339   21714   .       +       .       ID=7000003035155526.exon2;Parent=7000003035155526
7000000037415267        .       CDS     21339   21714   .       +       0       ID=cds.7000003035155526;Parent=7000003035155526
7000000037415267        .       exon    21749   22170   .       +       .       ID=7000003035155526.exon3;Parent=7000003035155526
7000000037415267        .       CDS     21749   22170   .       +       2       ID=cds.7000003035155526;Parent=7000003035155526
7000000037415267        .       exon    22307   22504   .       +       .       ID=7000003035155526.exon4;Parent=7000003035155526
7000000037415267        .       CDS     22307   22504   .       +       0       ID=cds.7000003035155526;Parent=7000003035155526

7000000037414998        .       gene    679960  682584  .       +       .       ID=7000003035181604;Function=conserved%20hypothetical%20protein;Name=PITG_09139
7000000037414998        .       mRNA    679960  682584  .       +       .       ID=7000003035181607;Parent=7000003035181604;Function=conserved%20hypothetical%20protein;Name=PITG_09139
7000000037414998        .       five_prime_UTR  679960  680620  .       +       .       ID=7000003035181607.utr5p1;Parent=7000003035181607
7000000037414998        .       five_prime_UTR  680710  680802  .       +       .       ID=7000003035181607.utr5p2;Parent=7000003035181607
7000000037414998        .       five_prime_UTR  680907  680909  .       +       .       ID=7000003035181607.utr5p3;Parent=7000003035181607
7000000037414998        .       exon    679960  680620  .       +       .       ID=7000003035181607.exon1;Parent=7000003035181607
7000000037414998        .       exon    680710  680802  .       +       .       ID=7000003035181607.exon2;Parent=7000003035181607
7000000037414998        .       exon    680907  681227  .       +       .       ID=7000003035181607.exon3;Parent=7000003035181607
7000000037414998        .       CDS     680910  681227  .       +       0       ID=cds.7000003035181607;Parent=7000003035181607
7000000037414998        .       exon    681298  681489  .       +       .       ID=7000003035181607.exon4;Parent=7000003035181607
7000000037414998        .       CDS     681298  681489  .       +       0       ID=cds.7000003035181607;Parent=7000003035181607
7000000037414998        .       exon    681563  682584  .       +       .       ID=7000003035181607.exon5;Parent=7000003035181607
7000000037414998        .       CDS     681563  682174  .       +       0       ID=cds.7000003035181607;Parent=7000003035181607
7000000037414998        .       three_prime_UTR 682175  682584  .       +       .       ID=7000003035181607.utr3p1;Parent=7000003035181607

The gff2fasta.pl script is using, as entry/gene names, whatever string is behind "ID=" entry. As in, the genes output will be:

>7000003035155523
ATGCCTTTAGCGACGATCACTCTCCTCTTCTTCGCTAGCTTACCTCCCCAATCCACTCTT...
>7000003035181604
GGTGAACATGTTGTCTGTATTGTCTGTACTTGCCGACCATGAGCTCCTCGGTAGTGCACA...

And the output for mRNA, peptides, cds will be:

>7000003035155526
MPLATITLLFFASLPPQSTLHSAICFLPTQRPLKVQPAMKLVSSAFGVFALLAAFVSGST...
>7000003035181607
MSFSKSNLPPTLPVAIKKEREDPSSLSGSMSIPGSSSSIPRKDSIGWGADDFLGMISHTP...

Is there a way to name each line of the resulting fasta file to the string following "Name="? In these cases, that would be:

>PITG_19619
ATGCCTTTAGCGACGATCACTCTCCTCTTCTTCGCTAGCTTACCTCCCCAATCCACTCTT...
>PITG_19619
GGTGAACATGTTGTCTGTATTGTCTGTACTTGCCGACCATGAGCTCCTCGGTAGTGCACA...

and

>PITG_19619
MPLATITLLFFASLPPQSTLHSAICFLPTQRPLKVQPAMKLVSSAFGVFALLAAFVSGST...
>PITG_19619
MSFSKSNLPPTLPVAIKKEREDPSSLSGSMSIPGSSSSIPRKDSIGWGADDFLGMISHTP...

Much appreciated,
Mike

Confirm that use of BLAST's `-max_target_seqs` is intentional

Hi there,

This is a semi-automated message from a fellow bioinformatician. Through a GitHub search, I found that the following source files make use of BLAST's -max_target_seqs parameter:

Based on the recently published report, Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows, there is a strong chance that this parameter is misused in your repository.

If the use of this parameter was intentional, please feel free to ignore and close this issue but I would highly recommend to add a comment to your source code to notify others about this use case. If this is a duplicate issue, please accept my apologies for the redundancy as this simple automation is not smart enough to identify such issues.

Thank you!
-- Arman (armish/blast-patrol)

Extract exon sequence based on GFF3 end FASTA

Hi, after a hard search on the net I found this awesome script. It works nice. However, I need to extract all exon sequence from a genome based on GFF3 and FASTA. Please, found attached a GFF3 sample file.
From that file I need to extract these sequences:
>Eucgr.A00001.1.v2.0.exon.1
ACTGTGACA......
>Eucgr.A00001.1.v2.0.exon.2
ACTGTGACA......
>Eucgr.A00001.1.v2.0.exon.3
ACTGTGACA......
(...)
>Eucgr.A00001.1.v2.0.exon.12
ACTGTGACA......
(...)
Could you help me?
Thank you so much!

sample_GFF3_tsv.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.