Coder Social home page Coder Social logo

evolinc-i's People

Contributors

andrew-d-l-nelson avatar chosenobih avatar upendrak avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

evolinc-i's Issues

Modify "updated gff"

Change gene IDs in the updated gff so that there are no "_" (underscores), as HTseq version 0.6.1 appears to have an issue with them when there are linked to the gene_id.

Running evolinc on a cluster

Hi,
I am interested in using the evolinc pipeline. I work on a cluster and that's where all my data is, so I want to run evolinc on a cluster using the command line. I tried to get docker, but it is not possible to get it on a cluster and I cannot install it otherwise. Is there a way I can run evolinc on the commandline on the cluster without docker?

IUPAC ambiguity codes in FASTA file

Error encountered when running Evolinc-I using FASTA files that have IUPAC ambiguity codes (e.g. KeyError: 'R'). Error was not found when ambiguous characters were replaced with 'N' and run with Evolinc-I again.

Add rFAM automatic screen

Add rFAM screen to the end of Evolinc-I so that this doesn't have to be done outside of the DE/command line. This will entail adding the rFAM library of RNAs (except snoRNAs).

Add an option for long read filtering/comparison.

People should be able to run long read transcripts through Evolinc. Alternatively, they should be able to compare their short read derived lncRNAs against any long read transcripts that are available. This would be an optional argument that would provide further support for the lncRNA annotation.

Questions about Evolinc modifications since publication

Hi,
I have read the paper of the most recent version of Evolinc and I have the following questions:

  1. It says in the paper to run the output FASTA against Rfam, but I see here in the resolved issues github that it says this feature has been added. Do you still suggest that I run my output against Rfam?
  2. Does Evolinc detect only long intergenic non coding RNAs or does it detect other types too?
  3. Since the paper came out, the developers of cuffcompare also made the program gffcompare, which is analogous to cuffcompare and I believe produces the same output. I would like to use gffcompare because its usage is simpler, can I use the output gtf from gffcompare in Evolinc, or do you suggest I stick to cuffcompare?
    Thank you

Create "intronic space" parameter

Allow for variable distance for removing gaps and merging hits on similar scaffolds (max gap length currently set to length of query lncRNA).

Replace underscores in Known_lincRNA bed file

There is a known issue when appending the gene ID of a known lincRNA to the final summary table if that known lincRNA has an underscore in its name in the bed/gff file used as input.

Chromosome IDs of Evolinc identified lincRNAs do not match parent annotation.

After running Evolinc 1.7.5 and Evolinc-Merge on the Discovery Environment, the output annotations have chromosome IDs of newly identified lincRNAs that do not match the parent (input) annotation.

The input annotation uses the nomenclature: 'Chr1', 'Chr2', 'Chr3', etc. and 'Scaffold12345'. The 'Final_updated.gtf' from Evolinc-Merge keeps this pattern for existing features, but new lincRNAs will lose the "Chr" identifier or the "Scaffold" identifier in column 1. Additionally, scaffold numbers that begin with a 0 in the parent annotation (e.g. "Scaffold00123") will lose those 0 values and will show "123" as the new chromosome.

Is this an issue for lincRNA identification if Evolinc is not able to assign the lincRNAs to the "known" chromosomes?

I have attached gzipped input and output annotations for your reference .

Thank you for your support.
Final_updated.gtf.gz
Cs_genes_v2.1_annot.gff3.gz

Using merged from gffcompare on DE

I tried running Evolinc-I on a cluster with Singularity and I have run into a number of issues, so I am opting to run it on the DE instead. I have the following question, my merged gtf is from gffcompare and not cuffcompare (since this program is now outdated and gffcompare is basically its newer version). Previously I had been told this was fine, but to use the -r flag. I was wondering what I can do when running it on the DE. Is there an option for this?

Error running Evolinc-I With both mandatory and optional files for sample data

i am getting this error message to the end of the the Evolinc run with optional files for sample data:

Error in as.data.table(newmat) : could not find function "as.data.table"
Calls: cSplit -> is.data.table
Execution halted
cp: cannot stat 'final_Summary_table.tsv': No such file or directory
All necessary files written to test_out
Finished Evolinc-part-I!

Error in parsing transcripts

Hi,
I was able to run Evolinc with the test data but now I am getting an error when using it on braker genome annotations. This is the error message

Tue Mar 9 17:39:39 UTC 2021
No fasta index found for referencegenome.fa. Rebuilding, please wait..
Fasta index rebuilt.
Generating Number of transcripts
##################################
grep: transcripts.*.fa: No such file or directory
transcripts.*.fa 
##################################
cat: transcripts.*.filter.fa: No such file or directory
[INFO] read file 'transcripts.all.overlapping.filter.fa'
[INFO] Predicting coding potential, please wait ...
[INFO] Running Done!
[INFO] cost time: 0s
[ERROR] putative_intergenic.genes.fa is not a file
grep: putative_intergenic.genes_cpc2.txt: No such file or directory
Can't open putative_intergenic.genes.fa: No such file or directory.
Generating Number of coding and noncoding
##################################
grep: putative_intergenic.genes_cpc2.txt: No such file or directory
putative_intergenic_coding_transcripts
grep: putative_intergenic.genes_cpc2.txt: No such file or directory
putative_intergenic_noncoding_transcripts
overlapping_coding_transcripts 1
overlapping_coding_transcripts 0

Looks like it's not able to extract the transcript sequences and run transdecoder correctly?
This is the format og my gtf file

CsWA_scaf115    AUGUSTUS        gene    1563351 1564313 .       -       .       jg29579
CsWA_scaf115    AUGUSTUS        transcript      1563351 1564313 .       -       .       transcript_id "jg29579.t1"; gene_id "jg29579"
CsWA_scaf115    AUGUSTUS        stop_codon      1563351 1563353 .       -       0       transcript_id "jg29579.t1"; gene_id "jg29579";
CsWA_scaf115    AUGUSTUS        CDS     1563351 1564313 0.88    -       0       transcript_id "jg29579.t1"; gene_id "jg29579";
CsWA_scaf115    AUGUSTUS        exon    1563351 1564313 .       -       .       transcript_id "jg29579.t1"; gene_id "jg29579";
CsWA_scaf115    AUGUSTUS        start_codon     1564311 1564313 .       -       0       transcript_id "jg29579.t1"; gene_id "jg29579";
CsWA_chr04      AUGUSTUS        gene    6431667 6433016 .       +       .       jg761
CsWA_chr04      AUGUSTUS        transcript      6431667 6433016 .       +       .       transcript_id "jg761.t1"; gene_id "jg761"
CsWA_chr04      AUGUSTUS        start_codon     6431667 6431669 .       +       0       transcript_id "jg761.t1"; gene_id "jg761";
CsWA_chr04      AUGUSTUS        CDS     6431667 6433016 0.94    +       0       transcript_id "jg761.t1"; gene_id "jg761";
CsWA_chr04      AUGUSTUS        exon    6431667 6433016 .       +       .       transcript_id "jg761.t1"; gene_id "jg761";
CsWA_chr04      AUGUSTUS        stop_codon      6433014 6433016 .       +       0       transcript_id "jg761.t1"; gene_id "jg761";
CsWA_scaf115    AUGUSTUS        gene    4180987 4181720 .       +       .       jg31437
CsWA_scaf115    AUGUSTUS        transcript      4180987 4181720 .       +       .       transcript_id "jg31437.t1"; gene_id "jg31437"
CsWA_scaf115    AUGUSTUS        start_codon     4180987 4180989 .       +       0       transcript_id "jg31437.t1"; gene_id "jg31437";
CsWA_scaf115    AUGUSTUS        CDS     4180987 4181063 0.59    +       0       transcript_id "jg31437.t1"; gene_id "jg31437";
CsWA_scaf115    AUGUSTUS        exon    4180987 4181063 .       +       .       transcript_id "jg31437.t1"; gene_id "jg31437";
CsWA_scaf115    AUGUSTUS        intron  4181064 4181137 .       +       .       transcript_id "jg31437.t1"; gene_id "jg31437";
CsWA_scaf115    AUGUSTUS        CDS     4181138 4181720 0.54    +       1       transcript_id "jg31437.t1"; gene_id "jg31437";
CsWA_scaf115    AUGUSTUS        exon    4181138 4181720 .       +       .       transcript_id "jg31437.t1"; gene_id "jg31437";
CsWA_scaf115    AUGUSTUS        stop_codon      4181718 4181720 .       +       0       transcript_id "jg31437.t1"; gene_id "jg31437";

Is there anything wrong with that?
Thank you in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.