isugenomics / common_scripts Goto Github PK
View Code? Open in Web Editor NEWThis project forked from aseetharam/common_scripts
my bin directory
This project forked from aseetharam/common_scripts
my bin directory
Hello,
I understand that you are not the original author of gff2fasta.pl (nor is it clear from the biostars page), but I was wondering if I could have some help modifying the script to be compatible with a gff file I was given that is oddly formatted.
Here are two entries from the file:
7000000037415267 . gene 21339 22504 . + . ID=7000003035155523;Function=polygalacturonase%2C%20putative;Name=PITG_19619
7000000037415267 . mRNA 21339 22504 . + . ID=7000003035155526;Parent=7000003035155523;Function=polygalacturonase%2C%20putative;Name=PITG_19619
7000000037415267 . exon 21339 21714 . + . ID=7000003035155526.exon2;Parent=7000003035155526
7000000037415267 . CDS 21339 21714 . + 0 ID=cds.7000003035155526;Parent=7000003035155526
7000000037415267 . exon 21749 22170 . + . ID=7000003035155526.exon3;Parent=7000003035155526
7000000037415267 . CDS 21749 22170 . + 2 ID=cds.7000003035155526;Parent=7000003035155526
7000000037415267 . exon 22307 22504 . + . ID=7000003035155526.exon4;Parent=7000003035155526
7000000037415267 . CDS 22307 22504 . + 0 ID=cds.7000003035155526;Parent=7000003035155526
7000000037414998 . gene 679960 682584 . + . ID=7000003035181604;Function=conserved%20hypothetical%20protein;Name=PITG_09139
7000000037414998 . mRNA 679960 682584 . + . ID=7000003035181607;Parent=7000003035181604;Function=conserved%20hypothetical%20protein;Name=PITG_09139
7000000037414998 . five_prime_UTR 679960 680620 . + . ID=7000003035181607.utr5p1;Parent=7000003035181607
7000000037414998 . five_prime_UTR 680710 680802 . + . ID=7000003035181607.utr5p2;Parent=7000003035181607
7000000037414998 . five_prime_UTR 680907 680909 . + . ID=7000003035181607.utr5p3;Parent=7000003035181607
7000000037414998 . exon 679960 680620 . + . ID=7000003035181607.exon1;Parent=7000003035181607
7000000037414998 . exon 680710 680802 . + . ID=7000003035181607.exon2;Parent=7000003035181607
7000000037414998 . exon 680907 681227 . + . ID=7000003035181607.exon3;Parent=7000003035181607
7000000037414998 . CDS 680910 681227 . + 0 ID=cds.7000003035181607;Parent=7000003035181607
7000000037414998 . exon 681298 681489 . + . ID=7000003035181607.exon4;Parent=7000003035181607
7000000037414998 . CDS 681298 681489 . + 0 ID=cds.7000003035181607;Parent=7000003035181607
7000000037414998 . exon 681563 682584 . + . ID=7000003035181607.exon5;Parent=7000003035181607
7000000037414998 . CDS 681563 682174 . + 0 ID=cds.7000003035181607;Parent=7000003035181607
7000000037414998 . three_prime_UTR 682175 682584 . + . ID=7000003035181607.utr3p1;Parent=7000003035181607
The gff2fasta.pl script is using, as entry/gene names, whatever string is behind "ID=" entry. As in, the genes output will be:
>7000003035155523
ATGCCTTTAGCGACGATCACTCTCCTCTTCTTCGCTAGCTTACCTCCCCAATCCACTCTT...
>7000003035181604
GGTGAACATGTTGTCTGTATTGTCTGTACTTGCCGACCATGAGCTCCTCGGTAGTGCACA...
And the output for mRNA, peptides, cds will be:
>7000003035155526
MPLATITLLFFASLPPQSTLHSAICFLPTQRPLKVQPAMKLVSSAFGVFALLAAFVSGST...
>7000003035181607
MSFSKSNLPPTLPVAIKKEREDPSSLSGSMSIPGSSSSIPRKDSIGWGADDFLGMISHTP...
Is there a way to name each line of the resulting fasta file to the string following "Name="? In these cases, that would be:
>PITG_19619
ATGCCTTTAGCGACGATCACTCTCCTCTTCTTCGCTAGCTTACCTCCCCAATCCACTCTT...
>PITG_19619
GGTGAACATGTTGTCTGTATTGTCTGTACTTGCCGACCATGAGCTCCTCGGTAGTGCACA...
and
>PITG_19619
MPLATITLLFFASLPPQSTLHSAICFLPTQRPLKVQPAMKLVSSAFGVFALLAAFVSGST...
>PITG_19619
MSFSKSNLPPTLPVAIKKEREDPSSLSGSMSIPGSSSSIPRKDSIGWGADDFLGMISHTP...
Much appreciated,
Mike
Hi there,
This is a semi-automated message from a fellow bioinformatician. Through a GitHub search, I found that the following source files make use of BLAST's -max_target_seqs
parameter:
Based on the recently published report, Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows, there is a strong chance that this parameter is misused in your repository.
If the use of this parameter was intentional, please feel free to ignore and close this issue but I would highly recommend to add a comment to your source code to notify others about this use case. If this is a duplicate issue, please accept my apologies for the redundancy as this simple automation is not smart enough to identify such issues.
Thank you!
-- Arman (armish/blast-patrol)
Hi, after a hard search on the net I found this awesome script. It works nice. However, I need to extract all exon sequence from a genome based on GFF3 and FASTA. Please, found attached a GFF3 sample file.
From that file I need to extract these sequences:
>Eucgr.A00001.1.v2.0.exon.1
ACTGTGACA......
>Eucgr.A00001.1.v2.0.exon.2
ACTGTGACA......
>Eucgr.A00001.1.v2.0.exon.3
ACTGTGACA......
(...)
>Eucgr.A00001.1.v2.0.exon.12
ACTGTGACA......
(...)
Could you help me?
Thank you so much!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.