mpdunne / orthofiller Goto Github PK
View Code? Open in Web Editor NEWOrthoFiller: Identifying missing annotations for evolutionarily conserved genes.
License: GNU General Public License v3.0
OrthoFiller: Identifying missing annotations for evolutionarily conserved genes.
License: GNU General Public License v3.0
Hi there,
Sorry to bother you with another issue. I know you have mentioned this code is not being maintained.
I just ran into an errors at step 2.6, where the following error is repeated hundreds of times (but for different individual files):
Error: Failed to open HMM file ortho_output/working/orthogroups/hmm/OG0013476.hmm for writing
and
Alignment input open failed.
couldn't open ortho_output_mikado_backup/working/orthogroups/alignments/OG0008511_NucAlignment.fasta for reading
This then causes step 2.7 to crash.
I also have a small error earlier up the pipeline - not sure if this would have anything to do with it. It occurs right before step 1.1.
Error: Sorted input specified, but the file - has the following out of order record
NC_000002.12 BestRefSeq CDS 41608 46385 . - 2 transcript_id "rna-NM_001077710.3"; gene_id "gene-FAM110C"; gene_name "FAM110C";
sort: write failed: 'standard output': Broken pipe
sort: write error
Error: Sorted input specified, but the file - has the following out of order record
NC_000072.7 BestRefSeq CDS 3372520 3377259 . - 0 transcript_id "rna-NM_010156.3"; gene_id "gene-Samd9l"; gene_name "Samd9l";
sort: write failed: 'standard output': Broken pipe
sort: write error
I also believe that step 2.5 is supposed to create alignment files with the suffix _NucAlignment.fasta
although such files to not exist in my orthogroups/alignments
directory (but _ProteinAlignment.fasta
files do exist). However in step 2.5, I do not see any error messages and it appears that all orthogroups have been processed.
I am not sure how to fix this issue - do you have any ideas? I am currently trying to dig through the code to figure out what the individual functions are doing. I am also hoping that OrthoFiller has checkpoints that will allow the run to continue at step 2.6 because it has already been running for just over a week. I'm not sure if this is the case, and I'm worried about testing this out in case I overwrite any part of the pipeline. Do you know if this is a default response of OrthoFiller if the same arguments are specified on the command line?
Thanks so much,
Zoe
Please update the readme to reflect necessary software versions.
hmmer 3.1b2
- newer versions recognize hmmerfm
database format implicitly; currently orthofiller.py
explicitly references the format in the nhmmer
command
orthofinder 2.2.3
- newer versions output .tsv
NOT .csv
bedtools 2.25.0
- newer versions do not play well with orthofiller
I recently ran an OrthoFiller process with 28 cores which crashed after some time for some input file problem. I had also time to track the timing of the program, which produces an output when the node considers the job as terminated (correctly or not). However, after 30-40 minutes, the ghost jobs were still visible with top (using 0% cpu and 0% of the RAM).
Hey there,
I just tried to install orthofiller
and it seems the README does not fully capture the dependencies or the whole setup procedure at this point: In order to get it run you don't only need to have the OrthoFinder
location in your $PATH
but actually you need to have all the uncompiled source of OrthoFinder
in the folder that you use for starting orthofiller
.
Otherwise it would constantly crash because it was unable to find the mcl
scripts as well as the OrthoFinder.py
itself. I solved it by downloading the source version and then putting OrthoFinder.py
plus the complete scripts
folder into the working directory from where i started orthofiller
.
Hope that is of some help!
Cheers,
Bastian
Hi, I am trying to use OrthoFiller and I noticed that in the runOrthoFil.sh script the AUGUSTUS_CONFIG_PATH is assigned to a variable which is not actually declared before.
loc_augustus="" # path to augustus config directory
export AUGUSTUS_CONFIG_PATH=$cfg_augustus
I have used orthofiller and have following issues:
"0.1. Checking installed programs" failed to test some shell commands
I'm sure that I have these shell commands but this step somewhat failed. So that I do not do into this step.
In the step "1.2. Preparing gtf files for Augustus training", I have errors:
***** ERROR: illegal character '.' found in integer conversion of string ".". Exiting...
***** ERROR: illegal character '.' found in integer conversion of string ".". Exiting...
I have googled it and it seems that this may be related to bedtools.
For Orthogroup statistics, number of genes seems to be number of isoforms. Is orthofiller aware of and able to deal with gene structures with mutiple isoforms?
Best,
Quan
Hello!
I am hoping to use OrthoFiller and am having an issue with my reference species data. When I run OrthoFiller, I get the following error:
Gtf file /path/species_files/genomic.gtf contains coordinates that do not exist in genome file $genome. Please adjust and try again.
I have installed all of the requested dependencies (including those listed in the previous "Issues" post), and have cleaned the headers on my fasta file. I have also double checked the the gtf contig names are the same as those in the fasta file. I am thinking that for some reason OrthoFiller isn't recognizing my genome paths. I'm not sure why, as I have formatted the file as requested.
Do you have any idea why I might be encountering such an error?
Thanks,
Zoe
I am experiencing an error with the gff_to_gtf_safe.py script. I am working with the gff output of the MAKER software and have successfully used clean_gff but when I try to convert to gtf using the wrapper script I get the error:
Traceback (most recent call last):
File "gff_to_gtf.py", line 76, in
printGTF(Transcriptdb)
File "gff_to_gtf.py", line 48, in printGTF
for idz, ex_cod in enumerate(exons):
TypeError: iteration over a 0-d array
I get this error with all of my files and have been unable to determine the cause. I have attached one of my files for reference. How can I resolve this error?
ExampleGFF.txt
With one of the file sets I'm using in the analysis, I get this error:
/software/python/Python2.7/lib/python2.7/site-packages/Bio/Seq.py:2071: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before transl ation. This may become an error in future.
I checked my files and many of the coding regions in the GTF file are not multiple by 3 looking at the exon feature, because it may contain UTR, while they are if looking at the cds feature. How does the program actually handle this? Does it read the exons, unite them and then estimate the coding region by searching for start and stop codons?
Also, I think it would be helpful for the end-user to see the gene name that generated the Biopython warning in the standard error, to do an immediate check and perhaps grep it out of the file.
I have tried to run OrthoFiller.py (v1.1.4) with OrthoFinder version 2.2.7 and 2.3.3 with the sample data. In both cases after OthoFinder finishes I get the following:
output_folder/sequences/aa Can't find Orthogroup output file... exiting...
I can confirm that the output directory "aa" contains the OrthoFinder output directory.
2445872 Dec 9 13:33 Ashgo1_1_AssemblyScaffolds.fasta.aa.fasta 3150717 Dec 9 13:33 Debha1_AssemblyScaffolds.fasta.aa.fasta 2579674 Dec 9 13:34 Klula1_AssemblyScaffolds.fasta.aa.fasta 4096 Dec 9 13:34 OrthoFinder 3023784 Dec 9 13:33 Sacce_S288C.genome.nomt.fa.aa.fasta 3292120 Dec 9 13:33 Yarli1_AssemblyScaffolds.fasta.aa.fasta
I believe this is a OrthoFinder version difference because the newer versions of do not produce OrthologousGroups.csv or Orthogroups.csv files but OrthologousGroups.tsv and Orthogroups.tsv.
Can you provide the best version of OrthoFinder to use with OrthoFiller?
The program seems not able to find a file that it should be internally generated:
grep: path/to/file.fa.proposedGenes: No such file or directory
(I substituted the original file with path/to/file)
Checking through the log is quite tedious because of the infinite number of progress report lines so I am not sure I missed the error line. What happened here? Is it not creating this file because it doesn't find any proposed gene for other reasons?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.