evolinc-i's People
evolinc-i's Issues
Modify "updated gff"
Change gene IDs in the updated gff so that there are no "_" (underscores), as HTseq version 0.6.1 appears to have an issue with them when there are linked to the gene_id.
Running evolinc on a cluster
Hi,
I am interested in using the evolinc pipeline. I work on a cluster and that's where all my data is, so I want to run evolinc on a cluster using the command line. I tried to get docker, but it is not possible to get it on a cluster and I cannot install it otherwise. Is there a way I can run evolinc on the commandline on the cluster without docker?
IUPAC ambiguity codes in FASTA file
Error encountered when running Evolinc-I using FASTA files that have IUPAC ambiguity codes (e.g. KeyError: 'R'). Error was not found when ambiguous characters were replaced with 'N' and run with Evolinc-I again.
Add rFAM automatic screen
Add rFAM screen to the end of Evolinc-I so that this doesn't have to be done outside of the DE/command line. This will entail adding the rFAM library of RNAs (except snoRNAs).
Add an option for long read filtering/comparison.
People should be able to run long read transcripts through Evolinc. Alternatively, they should be able to compare their short read derived lncRNAs against any long read transcripts that are available. This would be an optional argument that would provide further support for the lncRNA annotation.
Questions about Evolinc modifications since publication
Hi,
I have read the paper of the most recent version of Evolinc and I have the following questions:
- It says in the paper to run the output FASTA against Rfam, but I see here in the resolved issues github that it says this feature has been added. Do you still suggest that I run my output against Rfam?
- Does Evolinc detect only long intergenic non coding RNAs or does it detect other types too?
- Since the paper came out, the developers of cuffcompare also made the program gffcompare, which is analogous to cuffcompare and I believe produces the same output. I would like to use gffcompare because its usage is simpler, can I use the output gtf from gffcompare in Evolinc, or do you suggest I stick to cuffcompare?
Thank you
Create "intronic space" parameter
Allow for variable distance for removing gaps and merging hits on similar scaffolds (max gap length currently set to length of query lncRNA).
Add AOTs and SOTs to updated GFF file
Users have requested the addition of AOT and SOT lncRNAs to the GFF file in order to perform differential expression.
Error in calling unlink in diamondBlast step
This causes there to be no longest_ORFS_cat.pep.blastp file. Not sure if this is occurring on all systems or just within a windows Docker container.
Replace underscores in Known_lincRNA bed file
There is a known issue when appending the gene ID of a known lincRNA to the final summary table if that known lincRNA has an underscore in its name in the bed/gff file used as input.
Chromosome IDs of Evolinc identified lincRNAs do not match parent annotation.
After running Evolinc 1.7.5 and Evolinc-Merge on the Discovery Environment, the output annotations have chromosome IDs of newly identified lincRNAs that do not match the parent (input) annotation.
The input annotation uses the nomenclature: 'Chr1', 'Chr2', 'Chr3', etc. and 'Scaffold12345'. The 'Final_updated.gtf' from Evolinc-Merge keeps this pattern for existing features, but new lincRNAs will lose the "Chr" identifier or the "Scaffold" identifier in column 1. Additionally, scaffold numbers that begin with a 0 in the parent annotation (e.g. "Scaffold00123") will lose those 0 values and will show "123" as the new chromosome.
Is this an issue for lincRNA identification if Evolinc is not able to assign the lincRNAs to the "known" chromosomes?
I have attached gzipped input and output annotations for your reference .
Thank you for your support.
Final_updated.gtf.gz
Cs_genes_v2.1_annot.gff3.gz
Using merged from gffcompare on DE
I tried running Evolinc-I on a cluster with Singularity and I have run into a number of issues, so I am opting to run it on the DE instead. I have the following question, my merged gtf is from gffcompare and not cuffcompare (since this program is now outdated and gffcompare is basically its newer version). Previously I had been told this was fine, but to use the -r flag. I was wondering what I can do when running it on the DE. Is there an option for this?
Offer the option for FPKM filter in Evolinc-I
Can do it in a similar way to the coverage/base filter.
Error running Evolinc-I With both mandatory and optional files for sample data
i am getting this error message to the end of the the Evolinc run with optional files for sample data:
Error in as.data.table(newmat) : could not find function "as.data.table"
Calls: cSplit -> is.data.table
Execution halted
cp: cannot stat 'final_Summary_table.tsv': No such file or directory
All necessary files written to test_out
Finished Evolinc-part-I!
Error in parsing transcripts
Hi,
I was able to run Evolinc with the test data but now I am getting an error when using it on braker genome annotations. This is the error message
Tue Mar 9 17:39:39 UTC 2021
No fasta index found for referencegenome.fa. Rebuilding, please wait..
Fasta index rebuilt.
Generating Number of transcripts
##################################
grep: transcripts.*.fa: No such file or directory
transcripts.*.fa
##################################
cat: transcripts.*.filter.fa: No such file or directory
[INFO] read file 'transcripts.all.overlapping.filter.fa'
[INFO] Predicting coding potential, please wait ...
[INFO] Running Done!
[INFO] cost time: 0s
[ERROR] putative_intergenic.genes.fa is not a file
grep: putative_intergenic.genes_cpc2.txt: No such file or directory
Can't open putative_intergenic.genes.fa: No such file or directory.
Generating Number of coding and noncoding
##################################
grep: putative_intergenic.genes_cpc2.txt: No such file or directory
putative_intergenic_coding_transcripts
grep: putative_intergenic.genes_cpc2.txt: No such file or directory
putative_intergenic_noncoding_transcripts
overlapping_coding_transcripts 1
overlapping_coding_transcripts 0
Looks like it's not able to extract the transcript sequences and run transdecoder correctly?
This is the format og my gtf file
CsWA_scaf115 AUGUSTUS gene 1563351 1564313 . - . jg29579
CsWA_scaf115 AUGUSTUS transcript 1563351 1564313 . - . transcript_id "jg29579.t1"; gene_id "jg29579"
CsWA_scaf115 AUGUSTUS stop_codon 1563351 1563353 . - 0 transcript_id "jg29579.t1"; gene_id "jg29579";
CsWA_scaf115 AUGUSTUS CDS 1563351 1564313 0.88 - 0 transcript_id "jg29579.t1"; gene_id "jg29579";
CsWA_scaf115 AUGUSTUS exon 1563351 1564313 . - . transcript_id "jg29579.t1"; gene_id "jg29579";
CsWA_scaf115 AUGUSTUS start_codon 1564311 1564313 . - 0 transcript_id "jg29579.t1"; gene_id "jg29579";
CsWA_chr04 AUGUSTUS gene 6431667 6433016 . + . jg761
CsWA_chr04 AUGUSTUS transcript 6431667 6433016 . + . transcript_id "jg761.t1"; gene_id "jg761"
CsWA_chr04 AUGUSTUS start_codon 6431667 6431669 . + 0 transcript_id "jg761.t1"; gene_id "jg761";
CsWA_chr04 AUGUSTUS CDS 6431667 6433016 0.94 + 0 transcript_id "jg761.t1"; gene_id "jg761";
CsWA_chr04 AUGUSTUS exon 6431667 6433016 . + . transcript_id "jg761.t1"; gene_id "jg761";
CsWA_chr04 AUGUSTUS stop_codon 6433014 6433016 . + 0 transcript_id "jg761.t1"; gene_id "jg761";
CsWA_scaf115 AUGUSTUS gene 4180987 4181720 . + . jg31437
CsWA_scaf115 AUGUSTUS transcript 4180987 4181720 . + . transcript_id "jg31437.t1"; gene_id "jg31437"
CsWA_scaf115 AUGUSTUS start_codon 4180987 4180989 . + 0 transcript_id "jg31437.t1"; gene_id "jg31437";
CsWA_scaf115 AUGUSTUS CDS 4180987 4181063 0.59 + 0 transcript_id "jg31437.t1"; gene_id "jg31437";
CsWA_scaf115 AUGUSTUS exon 4180987 4181063 . + . transcript_id "jg31437.t1"; gene_id "jg31437";
CsWA_scaf115 AUGUSTUS intron 4181064 4181137 . + . transcript_id "jg31437.t1"; gene_id "jg31437";
CsWA_scaf115 AUGUSTUS CDS 4181138 4181720 0.54 + 1 transcript_id "jg31437.t1"; gene_id "jg31437";
CsWA_scaf115 AUGUSTUS exon 4181138 4181720 . + . transcript_id "jg31437.t1"; gene_id "jg31437";
CsWA_scaf115 AUGUSTUS stop_codon 4181718 4181720 . + 0 transcript_id "jg31437.t1"; gene_id "jg31437";
Is there anything wrong with that?
Thank you in advance
final_summary_table_gen_evo-I.R sub() function?
what does the "AGE_PLUS" refer to in line 422 of the R script?
(422) merge2$V1_2 <- sub("AGE_PLUS", "Yes", merge2$V1_2)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.