thh32 / protologger Goto Github PK
View Code? Open in Web Editor NEWProtologgger is a tool for the description of novel taxa by providing taxonomic, functional and ecological insights
Protologgger is a tool for the description of novel taxa by providing taxonomic, functional and ecological insights
Hi!
I cannot send this via Galaxy, so I am posting my issue here.
I am trying to run Protologger via Galaxy and I get the job gets cancelled.
I am getting this error report:
usearch v5.2.32 (32-bit, 2.0Gb avail / 528Gb RAM)
(C) Copyright 2010-11 Robert C. Edgar, all rights reserved.
License: [email protected]
00:00 20Mb 0.1% Reading /DATA/galaxy-run/galaxy/database/files/013/dataset_13419.dat
00:00 20Mb 0.1% Reading /DATA/galaxy-run/galaxy/database/files/013/dataset_13419.dat
00:00 33Mb 100.0% Reading /DATA/galaxy-run/galaxy/database/files/013/dataset_13419.dat
00:00 33Mb 1 sequences
00:00 35Mb 0.1% Reading /DATA/galaxy-run/galaxy/tools/protologger/bin/16S-SILVA/LTP-DB/LTPs132_SSU_compressed.fasta, 0 seqs
WARNING: Invalid FASTA file '/DATA/galaxy-run/galaxy/tools/protologger/bin/16S-SILVA/LTP-DB/LTPs132_SSU_compressed.fasta', non-letter ' ' in sequence >FJ611848-1-1348-1348bp-rna-Erwinia gerundensis-Erwiniaceae
00:01 103Mb 20.2% Reading /DATA/galaxy-run/galaxy/tools/protologger/bin/16S-SILVA/LTP-DB/LTPs132_SSU_compressed.fasta, 2788 seqs
00:01 316Mb 100.0% Reading /DATA/galaxy-run/galaxy/tools/protologger/bin/16S-SILVA/LTP-DB/LTPs132_SSU_compressed.fasta, 13.9k seqs
00:01 300Mb 100.0% Chimera search /DATA/galaxy-run/galaxy/database/files/013/dataset_13419.dat, 0/1 found (0.0%)
[CheckM - tree] Placing bins in reference genome tree.
Identifying marker genes in 1 bins with 1 threads:
Finished processing 0 of 1 (0.00%) bins.
Finished processing 1 of 1 (100.00%) bins.
Saving HMM info to file.
Calculating genome statistics for 1 bins with 1 threads:
Finished processing 0 of 1 (0.00%) bins.
Finished processing 1 of 1 (100.00%) bins.
Extracting marker genes to align.
Parsing HMM hits to marker genes:
Finished parsing hits for 1 of 1 (100.00%) bins.
Extracting 43 HMMs with 1 threads:
Finished extracting 0 of 43 (0.00%) HMMs.
Finished extracting 1 of 43 (2.33%) HMMs.
Finished extracting 2 of 43 (4.65%) HMMs.
Finished extracting 3 of 43 (6.98%) HMMs.
Finished extracting 4 of 43 (9.30%) HMMs.
Finished extracting 5 of 43 (11.63%) HMMs.
Finished extracting 6 of 43 (13.95%) HMMs.
Finished extracting 7 of 43 (16.28%) HMMs.
Finished extracting 8 of 43 (18.60%) HMMs.
Finished extracting 9 of 43 (20.93%) HMMs.
Finished extracting 10 of 43 (23.26%) HMMs.
Finished extracting 11 of 43 (25.58%) HMMs.
Finished extracting 12 of 43 (27.91%) HMMs.
Finished extracting 13 of 43 (30.23%) HMMs.
Finished extracting 14 of 43 (32.56%) HMMs.
Finished extracting 15 of 43 (34.88%) HMMs.
Finished extracting 16 of 43 (37.21%) HMMs.
Finished extracting 17 of 43 (39.53%) HMMs.
Finished extracting 18 of 43 (41.86%) HMMs.
Finished extracting 19 of 43 (44.19%) HMMs.
Finished extracting 20 of 43 (46.51%) HMMs.
Finished extracting 21 of 43 (48.84%) HMMs.
Finished extracting 22 of 43 (51.16%) HMMs.
Finished extracting 23 of 43 (53.49%) HMMs.
Finished extracting 24 of 43 (55.81%) HMMs.
Finished extracting 25 of 43 (58.14%) HMMs.
Finished extracting 26 of 43 (60.47%) HMMs.
Finished extracting 27 of 43 (62.79%) HMMs.
Finished extracting 28 of 43 (65.12%) HMMs.
Finished extracting 29 of 43 (67.44%) HMMs.
Finished extracting 30 of 43 (69.77%) HMMs.
Finished extracting 31 of 43 (72.09%) HMMs.
Finished extracting 32 of 43 (74.42%) HMMs.
Finished extracting 33 of 43 (76.74%) HMMs.
Finished extracting 34 of 43 (79.07%) HMMs.
Finished extracting 35 of 43 (81.40%) HMMs.
Finished extracting 36 of 43 (83.72%) HMMs.
Finished extracting 37 of 43 (86.05%) HMMs.
Finished extracting 38 of 43 (88.37%) HMMs.
Finished extracting 39 of 43 (90.70%) HMMs.
Finished extracting 40 of 43 (93.02%) HMMs.
Finished extracting 41 of 43 (95.35%) HMMs.
Finished extracting 42 of 43 (97.67%) HMMs.
Finished extracting 43 of 43 (100.00%) HMMs.
Aligning 43 marker genes with 1 threads:
Finished aligning 0 of 43 (0.00%) marker genes.
Finished aligning 1 of 43 (2.33%) marker genes.
Finished aligning 2 of 43 (4.65%) marker genes.
Finished aligning 3 of 43 (6.98%) marker genes.
Finished aligning 4 of 43 (9.30%) marker genes.
Finished aligning 5 of 43 (11.63%) marker genes.
Finished aligning 6 of 43 (13.95%) marker genes.
Finished aligning 7 of 43 (16.28%) marker genes.
Finished aligning 8 of 43 (18.60%) marker genes.
Finished aligning 9 of 43 (20.93%) marker genes.
Finished aligning 10 of 43 (23.26%) marker genes.
Finished aligning 11 of 43 (25.58%) marker genes.
Finished aligning 12 of 43 (27.91%) marker genes.
Finished aligning 13 of 43 (30.23%) marker genes.
Finished aligning 14 of 43 (32.56%) marker genes.
Finished aligning 15 of 43 (34.88%) marker genes.
Finished aligning 16 of 43 (37.21%) marker genes.
Finished aligning 17 of 43 (39.53%) marker genes.
Finished aligning 18 of 43 (41.86%) marker genes.
Finished aligning 19 of 43 (44.19%) marker genes.
Finished aligning 20 of 43 (46.51%) marker genes.
Finished aligning 21 of 43 (48.84%) marker genes.
Finished aligning 22 of 43 (51.16%) marker genes.
Finished aligning 23 of 43 (53.49%) marker genes.
Finished aligning 24 of 43 (55.81%) marker genes.
Finished aligning 25 of 43 (58.14%) marker genes.
Finished aligning 26 of 43 (60.47%) marker genes.
Finished aligning 27 of 43 (62.79%) marker genes.
Finished aligning 28 of 43 (65.12%) marker genes.
Finished aligning 29 of 43 (67.44%) marker genes.
Finished aligning 30 of 43 (69.77%) marker genes.
Finished aligning 31 of 43 (72.09%) marker genes.
Finished aligning 32 of 43 (74.42%) marker genes.
Finished aligning 33 of 43 (76.74%) marker genes.
Finished aligning 34 of 43 (79.07%) marker genes.
Finished aligning 35 of 43 (81.40%) marker genes.
Finished aligning 36 of 43 (83.72%) marker genes.
Finished aligning 37 of 43 (86.05%) marker genes.
Finished aligning 38 of 43 (88.37%) marker genes.
Finished aligning 39 of 43 (90.70%) marker genes.
Finished aligning 40 of 43 (93.02%) marker genes.
Finished aligning 41 of 43 (95.35%) marker genes.
Finished aligning 42 of 43 (97.67%) marker genes.
Finished aligning 43 of 43 (100.00%) marker genes.
Reading marker alignment files.
Concatenating alignments.
Placing 1 bins into the genome tree with pplacer (be patient).
{ Current stage: 0:02:38.590 || Total: 0:02:38.590 }
[CheckM - lineage_set] Inferring lineage-specific marker sets.
Reading HMM info from file.
Parsing HMM hits to marker genes:
Finished parsing hits for 1 of 1 (100.00%) bins.
Determining marker sets for each genome bin.
Finished processing 1 of 1 (100.00%) bins (current: dataset_13418).
Marker set written to: /DATA/galaxy-run/galaxy/tools/protologger/Output/jjlmtuce/CheckM_results/lineage.ms
{ Current stage: 0:00:00.592 || Total: 0:02:39.182 }
[CheckM - analyze] Identifying marker genes in bins.
Identifying marker genes in 1 bins with 1 threads:
Finished processing 0 of 1 (0.00%) bins.
Finished processing 1 of 1 (100.00%) bins.
Saving HMM info to file.
{ Current stage: 0:03:52.259 || Total: 0:06:31.442 }
Parsing HMM hits to marker genes:
Finished parsing hits for 1 of 1 (100.00%) bins.
Aligning marker genes with multiple hits in a single bin:
Finished processing 0 of 1 (0.00%) bins.
Finished processing 1 of 1 (100.00%) bins.
{ Current stage: 0:00:01.213 || Total: 0:06:32.655 }
Calculating genome statistics for 1 bins with 1 threads:
Finished processing 0 of 1 (0.00%) bins.
Finished processing 1 of 1 (100.00%) bins.
{ Current stage: 0:00:00.286 || Total: 0:06:32.942 }
[CheckM - qa] Tabulating genome statistics.
Calculating AAI between multi-copy marker genes.
Reading HMM info from file.
Parsing HMM hits to marker genes:
Finished parsing hits for 1 of 1 (100.00%) bins.
{ Current stage: 0:00:00.900 || Total: 0:06:33.843 }
Traceback (most recent call last):
File "/DATA/galaxy-run/galaxy/tools/protologger/protologger-0.98-GalaxyEdition.py", line 582, in
sim,total = Similarity(seq1,seq2)
NameError: name 'seq1' is not defined
Do you have any idea how to solve this?
Thank you!
I was wondering if users could opt for a more stringent cutoff at 99% while searching against amplicon libraries available on the IMNGS database? Also, I think that there are >300,000 libraries available on the IMNGS database but only a subset of it appears to be used for searching on Protologger?
Hi! First of all, thank you for developing an amazing tool for budding taxonomists.
Recently, I have uploaded a genome and 16S fasts file of one of my novel strain. It took almost three days for the files to process. Then today when I ran the Protologger, it returned with the "Fatal error: Exit code 1 ()" job message. So, I want to know the reason for my job failure.
Thank you in advance!
Some details I think you would require:
History Content API ID: 88b0d7c2322f1085
Job API ID: 4e77bbc3b46ddd17
History API ID: dea5f1d7ed1cb218
UUID: ea5ec2b9-6e26-4a05-abdd-5a1e4d88acfa
I'm trying to install protologger using the setup-protologger-env.sh script. So far I have hit two problems
# Download GTDB database
source ~/anaconda3/etc/profile.d/conda.sh
source ~/miniconda3/etc/profile.d/conda.sh
In my case, I have a ~/miniconda3 directory, but no ~/anaconda3, so as a result of 'set -e' the script fails when trying to source a path which does not exist.
These two commands are also assuming that conda is installed in the default directories of ~/anaconda3 or ~/miniconda3, and would also fail if someone who had their conda installation in a non-standard location tried to run it.
You could test for the presence of these files before trying to source them which would stop the script failing, or try to detect the conda installation location before trying to source them.
sudo python3.6 setup.py install
That is going to fail for the vast majority of users trying to install the software on a system they do not control i.e. an HPC cluster. Only root users will have sudo access to run this command, and it would be dangerous for them to do so without being aware of what it would do in advance. I saw nothing in the installation docs about root access being required, but for users with sudo access, scripts running commands as root potentially without warning (for example, a sysadmin who has considered sudo to trust them impicitly) could result in very bad things happening. There should be no need to run commands via sudo when installing into a conda environment.
I'll carry on trying to get this running while hacking on the installation script, and will pas on any further findings!
Hi,
ive managed to run protologger for my genome. Ive tried to upload the overview.txt and used as the input for GAN. but seems nothing is being generated. Do i need to do some editing on the overview.txt before uploading it?
May I suggest the addition of a new KEGG grouping "Denitrification, nitrate => nitrogen" (as in here- https://www.genome.jp/kegg-bin/show_module?M00529) ? Bacteria that are capable of denitrification can have genus names such as 'Nitratireductor' or species names such as 'denitrificans', 'nitrativorans', 'nitratireducenticrescens'
Hi,
Nice tool but I'll wait for the full conda version to start using it more extensively [not enough patience to use it over Galaxy;-)]. May I suggest the addition of a new KEGG grouping "Methane/C1 metabolism" for methylotrophic bacteria? Methylotrophic bacteria that are capable of growing on C1 compounds (e.g., methane, methanol) as their sole carbon and energy sources can have the prefix 'Methyl' added to their taxon names.
Hello!
Thanks for developing and maintaining protologger. It is distributed without license at the moment, which is probably an oversight and/or complex given the many dependencies. Still, no license can be problematic https://choosealicense.com/no-permission/.
Maybe looking up the most restrictive license of your dependencies and go from there could be an option?
Best,
I think it will be useful if the Protologger overview can inform users about the completeness of amino-acid biosynthetic pathways in their input organism (like what GapMind tool already does at https://papers.genomics.lbl.gov/cgi-bin/gapView.cgi). Users can for example analyze their MAGs/SAGs on Protologger and based on the absence of genes for the synthesis of specific amino-acids, they can perform targeted cultivation-based approaches (as in better design of cultivation media with the missing amino-acids extraneously added) for any of their potentially auxotrophic (as-yet-uncultivated) organisms. Besides, I think it is still informative to see a sentence in the protologue commenting about the organism's ability to synthesize all amino-acids or the lack thereof.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.