mpdunne / orthofiller Goto Github PK

View Code? Open in Web Editor NEW

22.0 22.0 1.0 20.17 MB

OrthoFiller: Identifying missing annotations for evolutionarily conserved genes.

License: GNU General Public License v3.0

orthofiller's People

Contributors

Stargazers

Watchers

Forkers

gedankenstuecke

orthofiller's Issues

Error opening HMM files for writing

Hi there,

Sorry to bother you with another issue. I know you have mentioned this code is not being maintained.

I just ran into an errors at step 2.6, where the following error is repeated hundreds of times (but for different individual files):

Error: Failed to open HMM file ortho_output/working/orthogroups/hmm/OG0013476.hmm for writing
and

Alignment input open failed.
   couldn't open ortho_output_mikado_backup/working/orthogroups/alignments/OG0008511_NucAlignment.fasta for reading

This then causes step 2.7 to crash.

I also have a small error earlier up the pipeline - not sure if this would have anything to do with it. It occurs right before step 1.1.

Error: Sorted input specified, but the file - has the following out of order record
NC_000002.12    BestRefSeq      CDS     41608   46385   .       -       2       transcript_id "rna-NM_001077710.3"; gene_id "gene-FAM110C"; gene_name "FAM110C";
sort: write failed: 'standard output': Broken pipe
sort: write error
Error: Sorted input specified, but the file - has the following out of order record
NC_000072.7     BestRefSeq      CDS     3372520 3377259 .       -       0       transcript_id "rna-NM_010156.3"; gene_id "gene-Samd9l"; gene_name "Samd9l";
sort: write failed: 'standard output': Broken pipe
sort: write error

I also believe that step 2.5 is supposed to create alignment files with the suffix _NucAlignment.fasta although such files to not exist in my orthogroups/alignments directory (but _ProteinAlignment.fasta files do exist). However in step 2.5, I do not see any error messages and it appears that all orthogroups have been processed.

I am not sure how to fix this issue - do you have any ideas? I am currently trying to dig through the code to figure out what the individual functions are doing. I am also hoping that OrthoFiller has checkpoints that will allow the run to continue at step 2.6 because it has already been running for just over a week. I'm not sure if this is the case, and I'm worried about testing this out in case I overwrite any part of the pipeline. Do you know if this is a default response of OrthoFiller if the same arguments are specified on the command line?

Thanks so much,
Zoe

IMPORTANT DEPENDENCY VERSION INFORMATION

Please update the readme to reflect necessary software versions.

hmmer 3.1b2 - newer versions recognize hmmerfm database format implicitly; currently orthofiller.py explicitly references the format in the nhmmer command

orthofinder 2.2.3 - newer versions output .tsv NOT .csv

bedtools 2.25.0 - newer versions do not play well with orthofiller

Program crashes but ghosh jobs keep going on

I recently ran an OrthoFiller process with 28 cores which crashed after some time for some input file problem. I had also time to track the timing of the program, which produces an output when the node considers the job as terminated (correctly or not). However, after 30-40 minutes, the ghost jobs were still visible with top (using 0% cpu and 0% of the RAM).

Dependencies

Hey there,
I just tried to install orthofiller and it seems the README does not fully capture the dependencies or the whole setup procedure at this point: In order to get it run you don't only need to have the OrthoFinder location in your $PATH but actually you need to have all the uncompiled source of OrthoFinder in the folder that you use for starting orthofiller.

Otherwise it would constantly crash because it was unable to find the mcl scripts as well as the OrthoFinder.py itself. I solved it by downloading the source version and then putting OrthoFinder.py plus the complete scripts folder into the working directory from where i started orthofiller.

Hope that is of some help!

Cheers,
Bastian

Augustus config path obtained from wrong variable

Hi, I am trying to use OrthoFiller and I noticed that in the runOrthoFil.sh script the AUGUSTUS_CONFIG_PATH is assigned to a variable which is not actually declared before.

loc_augustus="" # path to augustus config directory
export AUGUSTUS_CONFIG_PATH=$cfg_augustus

Several issues

I have used orthofiller and have following issues:

"0.1. Checking installed programs" failed to test some shell commands
I'm sure that I have these shell commands but this step somewhat failed. So that I do not do into this step.
In the step "1.2. Preparing gtf files for Augustus training", I have errors:
***** ERROR: illegal character '.' found in integer conversion of string ".". Exiting...
***** ERROR: illegal character '.' found in integer conversion of string ".". Exiting...
I have googled it and it seems that this may be related to bedtools.
For Orthogroup statistics, number of genes seems to be number of isoforms. Is orthofiller aware of and able to deal with gene structures with mutiple isoforms?

Best,
Quan

$genome not recognized in reference tdv file

Hello!

I am hoping to use OrthoFiller and am having an issue with my reference species data. When I run OrthoFiller, I get the following error:

Gtf file /path/species_files/genomic.gtf contains coordinates that do not exist in genome file $genome. Please adjust and try again.

I have installed all of the requested dependencies (including those listed in the previous "Issues" post), and have cleaned the headers on my fasta file. I have also double checked the the gtf contig names are the same as those in the fasta file. I am thinking that for some reason OrthoFiller isn't recognizing my genome paths. I'm not sure why, as I have formatted the file as requested.

Do you have any idea why I might be encountering such an error?

Thanks,
Zoe

Error in gff_to_gtf_safe.py

I am experiencing an error with the gff_to_gtf_safe.py script. I am working with the gff output of the MAKER software and have successfully used clean_gff but when I try to convert to gtf using the wrapper script I get the error:

Traceback (most recent call last):
File "gff_to_gtf.py", line 76, in
printGTF(Transcriptdb)
File "gff_to_gtf.py", line 48, in printGTF
for idz, ex_cod in enumerate(exons):
TypeError: iteration over a 0-d array

I get this error with all of my files and have been unable to determine the cause. I have attached one of my files for reference. How can I resolve this error?
ExampleGFF.txt

Sequences not multiple of 3: poor stderr information

With one of the file sets I'm using in the analysis, I get this error:

/software/python/Python2.7/lib/python2.7/site-packages/Bio/Seq.py:2071: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before transl ation. This may become an error in future.

I checked my files and many of the coding regions in the GTF file are not multiple by 3 looking at the exon feature, because it may contain UTR, while they are if looking at the cds feature. How does the program actually handle this? Does it read the exons, unite them and then estimate the coding region by searching for start and stop codons?

Also, I think it would be helpful for the end-user to see the gene name that generated the Biopython warning in the standard error, to do an immediate check and perhaps grep it out of the file.

Version of Orthofinder?

I have tried to run OrthoFiller.py (v1.1.4) with OrthoFinder version 2.2.7 and 2.3.3 with the sample data. In both cases after OthoFinder finishes I get the following:
output_folder/sequences/aa Can't find Orthogroup output file... exiting...

I can confirm that the output directory "aa" contains the OrthoFinder output directory.
2445872 Dec 9 13:33 Ashgo1_1_AssemblyScaffolds.fasta.aa.fasta 3150717 Dec 9 13:33 Debha1_AssemblyScaffolds.fasta.aa.fasta 2579674 Dec 9 13:34 Klula1_AssemblyScaffolds.fasta.aa.fasta 4096 Dec 9 13:34 OrthoFinder 3023784 Dec 9 13:33 Sacce_S288C.genome.nomt.fa.aa.fasta 3292120 Dec 9 13:33 Yarli1_AssemblyScaffolds.fasta.aa.fasta
I believe this is a OrthoFinder version difference because the newer versions of do not produce OrthologousGroups.csv or Orthogroups.csv files but OrthologousGroups.tsv and Orthogroups.tsv.

Can you provide the best version of OrthoFinder to use with OrthoFiller?

proposedGenes file not found

The program seems not able to find a file that it should be internally generated:
grep: path/to/file.fa.proposedGenes: No such file or directory

(I substituted the original file with path/to/file)

Checking through the log is quite tedious because of the infinite number of progress report lines so I am not sure I missed the error line. What happened here? Is it not creating this file because it doesn't find any proposed gene for other reasons?

orthofiller's hmmer commands raise a makehmmerdb bug that may lead to memory reallocation errors

See: EddyRivasLab/hmmer#213