timkahlke / basta Goto Github PK
View Code? Open in Web Editor NEWBasic Sequence Taxonomy Annotator
License: GNU General Public License v3.0
Basic Sequence Taxonomy Annotator
License: GNU General Public License v3.0
Hi,timkahlke
According to the doc,mapping database may be optional?
# download and set up genbank and uniprot mappings
# NOTE: this might not be needed for you. See Wiki for details
basta download gb
basta download prot
But when running BASTA mapping database must be set:
# Infer one LCA for each query sequence of blast against uniprot
basta sequence BLAST_OUTPUT_FILE BASTA_OUTPUT_FILE prot
# Infer one LCA for the complete blast output file
basta single BLAST_OUTPUT_FILE prot
# Infer one LCA for each blast output file in a given directory
basta multiple BLAST_OUTPUT_DIRECTORY BASTA_OUTPUT_FILE prot
I just want to use BASTA to do LCA from DIAMOND, must the mapping parameters needed? tks
Line 101 in AssignTaxonomy.py hast to be changed to
lca = self._assign_single(os.path.join(blast_dir,bf),db_file,best)
Hi !
I' m trying to assign my data with a custom database from GenBank and local sequences with high percentage of similarity (99%).
I ran blastn to obtain all the hit to my sequences with >99% similarity and then put the result in BASTA to obtain the LCA taxonomy.
I was very confuse of the few resulting matches, so I performed the analyse with the verbose option to see if the taxonomy of my hits were very divergente and I noticed that all my sequences with only one blastn hit were not assigned to the taxonomy of this hit. Is there any way to change this?
Best regards,
Marion
Hello:
My diamond output an file, which about 10 g, and I next to use the BASTA to estimate spices in it.
However, BASTA has been worked 10 days.
Could BASTA use more threads to make it faster.
What can I do to faster?
Thanks
Conda installation on MacOS returns this error:
CondaVerificationError: The package for krona located at /Users/tomasz/miniconda3/pkgs/krona-2.7.1-pl526_1
appears to be corrupted. The path 'opt/krona/lib/._KronaTools.pm'
specified in the package manifest cannot be found.
I installed BASTA from the Conda package (python 3) but i am not able to setup the taxonomy.
taxdump.tar.gz.md5 100%[===============================================================>] 49 --.-KB/s in 0s
2022-02-15 15:38:52 (5.04 MB/s) - โ/root/.basta/taxonomy/taxdump.tar.gz.md5โ saved [49]
Traceback (most recent call last):
File "/opt/miniconda/envs/basta_py3/bin/basta", line 4, in
import('pkg_resources').run_script('BASTA==1.4', 'basta')
File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/pkg_resources/init.py", line 662, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/pkg_resources/init.py", line 1459, in run_script
exec(code, namespace, namespace)
File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/EGG-INFO/scripts/basta", line 118, in
main.run_basta(args)
File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/BastaMain.py", line 89, in run_basta
self._basta_taxonomy(args)
File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/BastaMain.py", line 186, in _basta_taxonomy
dutils.down_and_check("ftp://ftp.ncbi.nih.gov/pub/taxonomy/","taxdump.tar.gz",args.directory)
File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/DownloadUtils.py", line 60, in down_and_check
while(check_md5(md5,out_dir)):
File "/opt/miniconda/envs/basta_py3/lib/python3.10/site-packages/BASTA-1.4-py3.10.egg/basta/DownloadUtils.py", line 46, in check_md5
filehash.update(open(os.path.join(path,l[1])).read())
TypeError: 'filter' object is not subscriptable
Can you help tu understand the problem ?
"No taxon found" printed multiple times for the same non-found taxon.
Hi,
upon an attempt to install basta in conda I go the following error. The yaml file seems to be badly interpreted.
$ wget https://github.com/timkahlke/BASTA/blob/master/environment_linux.yml
$ conda env create -f environment_linux.yml
...
$ /data/anaconda3/bin/conda-env create -f ./environment_linux.yml
Traceback (most recent call last):
File "/data/anaconda3/lib/python3.6/site-packages/conda/exceptions.py", line 640, in conda_exception_handler
return_value = func(*args, **kwargs)
File "/data/anaconda3/lib/python3.6/site-packages/conda_env/cli/main_create.py", line 78, in execute
directory=os.getcwd())
File "/data/anaconda3/lib/python3.6/site-packages/conda_env/specs/__init__.py", line 20, in detect
if spec.can_handle():
File "/data/anaconda3/lib/python3.6/site-packages/conda_env/specs/yaml_file.py", line 14, in can_handle
self._environment = env.from_file(self.filename)
File "/data/anaconda3/lib/python3.6/site-packages/conda_env/env.py", line 80, in from_file
return from_yaml(yamlstr, filename=filename)
File "/data/anaconda3/lib/python3.6/site-packages/conda_env/env.py", line 68, in from_yaml
data = yaml.load(yamlstr)
File "/data/anaconda3/lib/python3.6/site-packages/ruamel_yaml/main.py", line 75, in load
return loader.get_single_data()
File "/data/anaconda3/lib/python3.6/site-packages/ruamel_yaml/constructor.py", line 60, in get_single_data
node = self.get_single_node()
File "/data/anaconda3/lib/python3.6/site-packages/ruamel_yaml/composer.py", line 53, in get_single_node
document = self.compose_document()
File "/data/anaconda3/lib/python3.6/site-packages/ruamel_yaml/composer.py", line 76, in compose_document
self.get_event()
File "/data/anaconda3/lib/python3.6/site-packages/ruamel_yaml/parser.py", line 136, in get_event
self.current_event = self.state()
File "/data/anaconda3/lib/python3.6/site-packages/ruamel_yaml/parser.py", line 215, in parse_document_end
token = self.peek_token()
File "/data/anaconda3/lib/python3.6/site-packages/ruamel_yaml/scanner.py", line 144, in peek_token
self.fetch_more_tokens()
File "/data/anaconda3/lib/python3.6/site-packages/ruamel_yaml/scanner.py", line 239, in fetch_more_tokens
return self.fetch_value()
File "/data/anaconda3/lib/python3.6/site-packages/ruamel_yaml/scanner.py", line 598, in fetch_value
self.get_mark())
ruamel_yaml.scanner.ScannerError: mapping values are not allowed here
in "<unicode string>", line 323, column 24:
<!-- blob contrib key: blob_contributors:v21:0b8a2db9 ...
Regards,
Thierry
Hey @timkahlke
I am using a custom mapping database from GTDB in the format specified in the wiki. But when I use it for create_db
command, it gives the following error-
mapping file-
accession accession.version taxid gi
GCA007129655 GCA007129655.1 2022
GCF000979455 GCF000979455.1 669
GCA007280465 GCA007280465.1 3546
GCF000025865 GCF000025865.1 4262
GCA004525545 GCA004525545.1 3017
GCA007118145 GCA007118145.1 2546
code used-
/home/j/jigyasa-arora/local/BASTA/bin/basta create_db accession2taxid.tsv prot_mapping.db 0 2
error-
Creating database
[BASTA STATUS] Reading mapping file
This might take a while, please be patient ...
Traceback (most recent call last):
File "/home/j/jigyasa-arora/local/BASTA/bin/basta", line 115, in
main.run_basta(args)
File "/home/j/jigyasa-arora/.local/lib/python3.7/site-packages/BASTA-1.3.2.3-py3.7.egg/basta/BastaMain.py", line 86, in run_basta
self._basta_create_db(args)
File "/home/j/jigyasa-arora/.local/lib/python3.7/site-packages/BASTA-1.3.2.3-py3.7.egg/basta/BastaMain.py", line 171, in _basta_create_db
dbutils.create_db(args.directory,args.input,args.output,args.key,args.value)
File "/home/j/jigyasa-arora/.local/lib/python3.7/site-packages/BASTA-1.3.2.3-py3.7.egg/basta/DBUtils.py", line 67, in create_db
lookup.put(ls[i1],ls[i2])
TypeError: Argument 'key' has incorrect type (expected bytes, got str)
Custom db doesn't have ending on database directory
Hi,
I ran blastn online, download the hits table as a csv and converted it to a tab delimited file. This is supposed to be required input for basta. However, I still get the following error:
#
# INDEX ERROR WHILE CHECKING e-value, alingment length OR percent identity!!!.
# Are you sure that your input file has the correct format?
# (For details check https://github.com/timkahlke/BASTA/wiki/3.-BASTA-Usage#input-file-format)
#
#####
Please advise on how to fix this.
Thanks,
Ilya.
Hi @timkahlke
Any updates on when a Python 3 version of BASTA will be available? I am hoping to use BASTA within a computational pipeline and would like to avoid having to replace BASTA when Python 2 becomes unsupported by my university's HPC.
Thanks!
Dear @timkahlke ,
I've been trying out BASTA on simulated data, however, I can never get down to the specie level:
Here is an example of my blast output:
tmp19 NC_029448.1 91.67 48 4 0 53 100 9950 9997 2e-10 67.6
tmp19 NC_029330.1 91.30 46 4 0 54 99 10854 10899 3e-09 63.9
tmp19 NC_023799.1 91.30 46 4 0 54 99 9948 9993 3e-09 63.9
tmp19 NC_022507.1 90.00 50 4 1 51 100 9961 10009 3e-09 63.9
tmp20 NC_035317.1 100.00 100 0 0 1 100 60015 60114 5e-46 185
tmp21 NC_035995.1 100.00 100 0 0 1 100 24700 24799 5e-46 185
tmp21 NC_029485.1 100.00 100 0 0 1 100 23785 23884 5e-46 185
tmp21 NC_028523.1 100.00 100 0 0 1 100 24181 24280 5e-46 185
For the sequence tmp20
, there is only one hit, so I should be able to go down the specie level, since the full taxonomic lineage is known for NC_035317.1
However, BASTA only goes to the genus level:
tmp20 Eukaryota;Streptophyta;Liliopsida;Alismatales;Hydrocharitaceae;Stratiotes;
Here is the basta command line I used:
basta sequence blast_results_100.out basta_results_100.out gb -m 1 -n 10 -i 99
Strip on empty taxa throws error
When adding a new mapping db files are
Hi,
could you please help me with basta taxonomy problem?
The taxdump.tar.gz gets downloaded but the md5 sum does not match so the file is re-downloaded... resulting in a never-ending loop of re-downloading and md5 sum mismatching...
I dont know if the problem is with me or with NCBI.
I would appreciate any advice on that.
With best regards,
Dasa
Hi,
I've been trying to use basta on output from diamond. I believe my diamond results are in the correct format that is default for basta (-outfmt 6) and the accession I'm finding (via grep) are in the prot.accession2taxid.FULL that was used to generate my database, however I am not getting any taxa names whjen I run basta.
Here is an example line for my basta input:
M01019:41:000000000-A5RV8:1:1114:11342:10949 MBS1567671.1 92.2 51 4 0 2 154 139 189 1.43e-26 107
When I use basta, I then get "No mapping found for MBS1567671" and the resulting output file has everything as "unknown".
It looks like for some reason basta is ignoring the ".1", so although it should search for MBS1567671.1 it's searching for MBS1567671. Both my diamond and basta databases were generated using the same version of the prot.accession2taxid.FULL.
It looks like this is a similar issue to one previously posted: #11. I attempted re-ran with -v and the file just has all my query sequence names in this format:
###M01019:41:000000000-A5RV8:1:1101:14530:2789
###M01019:41:000000000-A5RV8:1:1101:12152:2947
Any idea what I might need to do to fix this?
I downloaded and created a NCBI taxonomy database with "-d option". When I run "basta sequence $INPUT_FILE $OUTPUT_FILE gb", warn "# [BASTA ERROR] No database gb_mapping.db found in /home/XX/.basta/taxonomy. Did you forget to create the specified database or was it a typo?
". How do I specify custom directory of database?
Hello @timkahlke, I have a question related to the basta sequence output.
If none of the hits from a query meet the criterias defined by the basta sequence arguments, will this query ID be present in the output as "Unknown"?
I have a DIAMOND output containing hits from 274,861,379 queries. I expected an output containing a line for each query, but my output has 168,316,701 lines. Thus, my Krona chart displays an wrong percentage of Unknown sequences.
Hey @timkahlke
I followed the tutorial on how to create a database on already downloaded NCBI mapping file https://github.com/timkahlke/BASTA/wiki/2.-Initial-Setup#1-download-ncbi-sequence-databases.
The steps I am running-
$wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz #custom for nr database
$gunzip prot.accession2taxid.gz
$/work/BASTA-1.3.2.3/bin/basta create_db prot.accession2taxid prot_mapping.db 1 2
#create_db, using second column (1)->to get new accession ids (eg-WP_090162531.1) , and third column (2)-to get taxids
$/work/BASTA-1.3.2.3/bin/basta sequence basta-COG0552.txt basta-result-COG0552.txt prot
#running the basta on blast output
When I try the new database on my blast output, I get an empty result. Whereas just "grepping" the accession number to the prot.accession2taxid gives an output.
Where am I going wrong? (Using basta on uniref90 database works though)
Hi Tim,
I got everything up and running, now I'm working on figuring out the best settings for my particular dataset. I didn't run into any additional issues after the db setup steps.
Here are some unsolicited tips that I found while working with python that might help you out, as I've written a few tools and have had to go through a similar learning process as you have here. Not sure if you're still working with python regularly but I digress.
At any rate, I noticed that you take some flags and convert them to bool using argparse. This is generally not recommended:
https://docs.python.org/3/library/argparse.html#type and search for bool, you'll see the relevant section.
If you do want True/False values, you'd generally use the action='store_true'
, or if you want --foo and --no-foo as options, you can use action=argparse.BooleanOptionalAction
since python 3.8.
I'm sure the vast majority of the time this is a non-issue, but people might be confused if they have to run basta --quiet True
to turn on quiet, and then try basta --quiet False
and see that it's still quiet. You can confirm this yourself easily (and I'm sure logically you know this already):
$ python -c 'print(bool("False"))'
True
Lastly, for future projects where you have multiple subparsers that share options, you can set them up as I have here:
https://github.com/davised/get_assemblies/blob/main/get_assemblies/__main__.py#L234 and scroll to the for p in all_p:
line.
and iterate over them to add the options to each command in a loop. That way you reduce the redundant code and copy/paste errors when you want to change something (like the bool thing above, only having to change it one place). There may be even other, better ways to handle the subparsers, but this method has worked for me for several different projects.
Cheers,
Ed
When trying multi get an error
basta2krona py fails to parse basta output as there are single column rows with ### and empty lines.
Existing code:
def _parseBASTA(bf):
counts = {}
with open(bf,"r") as f:
for line in f:
ls = line.split("\t")
try:
counts[ls[1]] += 1
except KeyError:
counts[ls[1]] = 1
return counts
Proposed code:
def _parseBASTA(bf):
counts = {}
with open(bf, "r") as f:
for line in f:
if len(line.strip())!=0 and not line.startswith("###"):
ls = line.split("\t")
try:
counts[ls[1]] += 1
except KeyError:
counts[ls[1]] = 1
return counts
Test data:
###contig_764080
23 Eukaryota;
23 Eukaryota;Arthropoda;
23 Eukaryota;Arthropoda;Insecta;
23 Eukaryota;Arthropoda;Insecta;Lepidoptera;
23 Eukaryota;Arthropoda;Insecta;Lepidoptera;Bombycidae;
23 Eukaryota;Arthropoda;Insecta;Lepidoptera;Bombycidae;Bombyx;
23 Eukaryota;Arthropoda;Insecta;Lepidoptera;Bombycidae;Bombyx;Bombyx_mori;
###contig_765902
1 Bacteria;
1 Bacteria;Proteobacteria;
1 Bacteria;Proteobacteria;Betaproteobacteria;
1 Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;
1 Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Oxalobacteraceae;
1 Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Oxalobacteraceae;Candidatus_Zinderia;
1 Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Oxalobacteraceae;Candidatus_Zinderia;Candidatus_Zinderia_insecticola;
Index will be out of bounds with existing code.
Drear man,
I have download taxdump files already, when I run basta sequence, it shows I should creat complete_taxa.db, what should I need to do to creat complete_taxa.db ? I found a input file is need, but I don't kown to ues which file as input file .
Hi, developer,
Thanks for developing this amazing software. May I please know whether it can be used for taxonomy classification of a de novo assembled genome bin? Thank you!
Hey @timkahlke
I am running an LCA search for diamond blast output against the Uniref90 database using BASTA.
There are two things that I want to ask-
I first downloaded the NCBI protein database, and then the Uniprot database. But I got confused and deleted the prot.accession2taxid.gz file. The other mapping file and folder it generated were still intact before I downloaded the Uniprot database.
Would the absence of the prot.accession2taxid.gz file affect the BASTA search against the Uniprot database?
After I got a no-match against the Uniprot database, I grepped some Uniref90 IDs from my blast output to idmapping_selected.tab.gz, and I couldn't find common IDs.
Could you suggest a reason for that?
I downloaded the latest version from https://www.uniprot.org/downloads
Hi ! Thank you for the reply. I tried generating Krona chart with the script and basta output. Krona (output) html is empty.
$ head Basta_output.tsv
contig_48 Unknown Eukaryota;Arthropoda;Insecta;Coleoptera;Carabidae;Amara;Amara_sp._KAO-2002;
contig_65 Unknown Eukaryota;Arthropoda;Insecta;Coleoptera;Carabidae;Amara;Amara_alpina;
contig_117 Unknown Eukaryota;Arthropoda;Insecta;Hymenoptera;Vespidae;Vespula;Vespula_pensylvanica;
contig_130 Unknown Unknown
contig_214 Unknown Viruses;unknown;unknown;unknown;Polydnaviridae;Bracovirus;Cotesia_sesamiae_bracovirus;
contig_375 Unknown Eukaryota;Arthropoda;Insecta;Coleoptera;Carabidae;Zabrus;Zabrus_ignavus;
contig_408 Viruses;Phixviricota;Malgrandaviricetes;Petitvirales;Microviridae;Sinsheimervirus;Escherichia_virus_phiX174; Viruses;Phixviricota;Malgrandaviricetes;Petitvirales;Microviridae;Sinsheimervirus;Escherichia_virus_phiX174;
contig_565 Unknown Eukaryota;Arthropoda;Insecta;Coleoptera;Carabidae;Amara;Amara_alpina;
contig_597 Unknown Eukaryota;Arthropoda;Insecta;Coleoptera;Zopheridae;Verodes;Verodes_sp._nov._C_ER-2011;
contig_619 Unknown Eukaryota;Arthropoda;Insecta;Lepidoptera;Bombycidae;Bombyx;Bombyx_mori;
Command:
$ python3 basta2krona.py Basta_output.tsv Krona.html
I would appreciate if you could share an example working output so that I can troubleshoot output I have. Link to download the output file I have : https://docs.google.com/spreadsheets/d/1gIrihuvNo2mV3X0JgQGsKaCuiPd4Xp2IyAKZfLYpY6Q/edit?usp=sharing. Link contains a tsv file from personal gmail account.
`
Hi,
If i use NR database, which mapping file should i download?
Thanks
Hi @timkahlke,
Is there some argument to hide the warning messages for mappings not found?
The printing of these warnings increases the time needed to finish the process (basta sequence).
Hello @timkahlke ,
right now it is possible to get a count of hits per DB reference sequence (with the verbose flag), but it is not possible to get a count of hits per LCA, would it be doable to add it ?
Example:
###k79_72
11 Eukaryota;
11 Eukaryota;Streptophyta;
11 Eukaryota;Streptophyta;unknown;
11 Eukaryota;Streptophyta;unknown;Caryophyllales;
11 Eukaryota;Streptophyta;unknown;Caryophyllales;Chenopodiaceae;
9 Eukaryota;Streptophyta;unknown;Caryophyllales;Chenopodiaceae;Beta;
8 Eukaryota;Streptophyta;unknown;Caryophyllales;Chenopodiaceae;Beta;Beta_vulgaris;
8 Eukaryota;Streptophyta;unknown;Caryophyllales;Chenopodiaceae;Beta;Beta_vulgaris;
;
1 Eukaryota;Streptophyta;unknown;Caryophyllales;Chenopodiaceae;Beta;Beta_macrocarpa;
1 Eukaryota;Streptophyta;unknown;Caryophyllales;Chenopodiaceae;Beta;Beta_macrocarpa;
;
1 Eukaryota;Streptophyta;unknown;Caryophyllales;Chenopodiaceae;Chenopodium;
1 Eukaryota;Streptophyta;unknown;Caryophyllales;Chenopodiaceae;Chenopodium;Chenopodium_quinoa;
1 Eukaryota;Streptophyta;unknown;Caryophyllales;Chenopodiaceae;Chenopodium;Chenopodium_quinoa;
;
1 Eukaryota;Streptophyta;unknown;Caryophyllales;Chenopodiaceae;Spinacia;
1 Eukaryota;Streptophyta;unknown;Caryophyllales;Chenopodiaceae;Spinacia;Spinacia_oleracea;
1 Eukaryota;Streptophyta;unknown;Caryophyllales;Chenopodiaceae;Spinacia;Spinacia_oleracea;
;
Here the LCA would be Chenopodiaceae
with 11 hits
What I'm doing on my side is combining the "regular" output of BASTA (that gives the LCA) and the "verbose" output of BASTA that gives count to get a count per LCA, but that's not really clean since I'm parsing both of these output files...
Hello,
I am trying to use BASTA on some DIAMOND tsv output. When I run basta sequence, however, I get a warning # [BASTA WARNING] No taxon found for 1346611
and the basta output file contains only "unknown" annotations.
My DIAMOND output looks as follows:
T070O:00666:05788\tWP_005933513.1_853\t78.3\t69\t15\t0\t208\t2\t1\t69\t1.7e-24\t119.0
I have created a custom config file to indicate that the first column is the read_id, the second is the annotation_id, etc.
WP_005933513.1_853
is basically the accession number and the taxon id concatenated by a "_", but in order to perform the mapping with BASTA I have created a mapping file, mapping each annotation_id to the corresponding taxon_id like this:
WP_005933513.1_853\t853
I have created a mapping db of that file using basta create_db
.
I think that I have done everything I should correctly, but BASTA is unable to find the taxon eventhough I can find the taxon in the mapping file and the taxdump I downloaded using grep. I'm out of ideas...
Hi @timkahlke
I was running basta download -d /mnt/Indices/genomes/basta/taxonomy/ uni to download uniprot database. It is downloading the idmapping_selected.tab and creating the database with name "prot_mapping.db".
After this, I am running basta sequence dataset_19472.txt dataset_19493.txt uni --directory /mnt/Indices/genomes/basta/taxonomy/
[BASTA ERROR] No database uni_mapping.db found in /mnt/Indices/genomes/basta/taxonomy/. Did you forget to create the specified.
Please let me know the reason of failure.
Dear Tim,
when running the basta commands (from a cloned git in the envs bin folder) I get the following buggy uotpu when downloading the taxdump and databse association files:
Traceback (most recent call last):
File "./basta", line 117, in
main.run_basta(args)
File "./../basta/BastaMain.py", line 71, in run_basta
self._basta_download(args)
File "./../basta/BastaMain.py", line 147, in _basta_download
dutils.down_and_check(args.ftp,map_file,args.directory)
File "./../basta/DownloadUtils.py", line 56, in down_and_check
self.down(ftp,md5,out_dir)
NameError: global name 'self' is not defined
The files are downloaded, however.
When running a primary trial analysis a full blown error appears:
$/data/anaconda3/envs/basta/bin/BASTA/bin/basta sequence ./data/assembly-nt.tab ./results/assembly-lca.test gb
Traceback (most recent call last):
File "/data/anaconda3/envs/basta/bin/BASTA/bin/basta", line 9, in
from basta import BastaMain as bm
File "/data/anaconda3/envs/basta/bin/BASTA/bin/../basta/BastaMain.py", line 6, in
import plyvel
ModuleNotFoundError: No module named 'plyvel'
The lacking module has been installed.
$ conda list
...
plyvel 0.8 py27_0 bnoon
...
I hope you can help.
Kind regards,
Thierry
Hi!
I am trying to assign taxonomy to a diamond blastp results using basta.
I downloaded the protein (prot) database using basta download prot.
In the database directory I have these two files: prot.accession2taxid.gz.md5, prot.accession2taxid.gz and the
prot_mapping.db folder.
I kept getting an error, looks like basta is looking for a db called complete_taxa.db.
Is there something that am missing in my command????
thank you
Hey @timkahlke
I am working with multiple metagenome taxonomy, and I was wondering if it's possible to parallelize the job?
When I try to do so, I get an error even though I am using different databases for two different runs-
Traceback (most recent call last):
File "/work/student/jigyasa-arora/BASTA-1.3.2.3/bin/basta", line 115, in
main.run_basta(args)
File "/home/j/jigyasa-arora/.local/lib/python2.7/site-packages/BASTA-1.3.2.2-py2.7.egg/basta/BastaMain.py", line 80, in run_basta
self._basta_multiple(args)
File "/home/j/jigyasa-arora/.local/lib/python2.7/site-packages/BASTA-1.3.2.2-py2.7.egg/basta/BastaMain.py", line 121, in _basta_multiple
assigner._assign_multiple(args.blast,db_file,args.best_hit)
File "/home/j/jigyasa-arora/.local/lib/python2.7/site-packages/BASTA-1.3.2.2-py2.7.egg/basta/AssignTaxonomy.py", line 101, in _assign_multiple
lca = self._assign_single(os.path.join(blast_dir,bf),db_file,best)
File "/home/j/jigyasa-arora/.local/lib/python2.7/site-packages/BASTA-1.3.2.2-py2.7.egg/basta/AssignTaxonomy.py", line 79, in _assign_single
(tax_lookup, map_lookup) = self._get_lookups(db_file)
File "/home/j/jigyasa-arora/.local/lib/python2.7/site-packages/BASTA-1.3.2.2-py2.7.egg/basta/AssignTaxonomy.py", line 108, in _get_lookups
tax_lookup = db._init_db(os.path.join(self.directory,"complete_taxa.db"))
File "/home/j/jigyasa-arora/.local/lib/python2.7/site-packages/BASTA-1.3.2.2-py2.7.egg/basta/DBUtils.py", line 75, in _init_db
lookup = plyvel.DB(os.path.abspath(db))
File "plyvel/_plyvel.pyx", line 247, in plyvel._plyvel.DB.init
File "plyvel/_plyvel.pyx", line 88, in plyvel._plyvel.raise_for_status
plyvel._plyvel.IOError: IO error: lock /home/j/jigyasa-arora/.basta/taxonomy/complete_taxa.db/LOCK: Resource temporarily unavailable
I de novo assembly a genome from fastq files and want to remove organelle genomes (mitochondria, chloroplasts, etc.) and plasmids genomes. How should I set up a custom database of organelles and plasmids genomes? The genomes of organelles and plasmids were also de novo assembly.
Check for existing file and, if already there, warn or remove
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.