zyxue / ncbitax2lin Goto Github PK

View Code? Open in Web Editor NEW

137.0 8.0 29.0 364 KB

🐞 Convert NCBI taxonomy dump into lineages

License: MIT License

Python 95.84% Makefile 4.16%

ncbi-taxonomy taxdump lineage python pandas taxonomy ncbi

ncbitax2lin's People

Contributors

Stargazers

Watchers

ncbitax2lin's Issues

conda create env fails due to package versions

At least on my ubuntu machine, env-conda.txt is not making conda create env happy (version numbers for 4 packages (mkl numpy readline tk).

After removing the version number for these 4 packages in env-conda.txt I could build the env, activate it, and run 'make'

hope this helps others.

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
mkl
numpy
openssl=1.0.2k=0
pandas=0.19.2=np112py27_1
pip=9.0.1=py27_1
python=2.7.13=0
python-dateutil=2.6.0=py27_0
pytz=2016.10=py27_0
readline
setuptools=27.2.0=py27_0
six=1.10.0=py27_0
sqlite=3.13.0=0
tk
wheel=0.29.0=py27_0
zlib=1.2.8=3

problems with the lineages.csv.gz

the lineages.csv.gz file cannot be unzipped (using gunzip). Is there obvious reason why this would fail (I have tried on Mac and Unix, and downloading Google Chrome, Safari) - and is it possible to get a link to the unzipped version of the lineages.csv files? It'd save a huge amount of time as PANDAS is not installed on my computer or the remote server that I access. Many thanks.

Can't seem to get this running on my mac

Everything seems to be fine through conda activate venv (though I had to switch to bash to get this to work, as it doesn't work in tcsh - perhaps this should be mentioned in the readme...). Anyway, when I run the next command make, I get the error:

usage: md5sum [-bv] [-c [file]] | [file...]
Generates or checks MD5 Message Digests
    -c  check message digests (default is generate)
    -v  verbose, print file names when checking
    -b  read files in binary mode
The input for -c should be the list of message digests and file names
that is printed on stdout by this program when it generates digests.
make[1]: *** [taxdump.tar.gz] Error 2
make: *** [taxdump] Error 2

So it looks like the call to md5sum is for some other version?

And if I try to run the script, I get:

./ncbitax2lin.py 
./ncbitax2lin.py: line 1: import: command not found
./ncbitax2lin.py: line 2: import: command not found
./ncbitax2lin.py: line 3: import: command not found
./ncbitax2lin.py: line 4: import: command not found
./ncbitax2lin.py: line 5: import: command not found
./ncbitax2lin.py: line 6: import: command not found
./ncbitax2lin.py: line 8: import: command not found
 from: can't read /var/mail/utils
./ncbitax2lin.py: line 13: syntax error near unexpected token `newline'
./ncbitax2lin.py: line 13: `logging.basicConfig('

I'm not very experienced with python. Is this a problem on my end or is there an incompatibility issue or what?

Error in installation

Dear Xue,

Happy to use your project for lineage conversion. However, I encountered one problem about its installation in python 3.7.
The error is below:

Very appreciate you could help resolve this issue.

Thank in advance!

Best regards.

WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/ncbitax2lin/
Collecting ncbitax2lin
  Using cached ncbitax2lin-2.0.2-py3-none-any.whl (8.1 kB)
Requirement already satisfied, skipping upgrade: typing-extensions<4.0.0,>=3.7.4 in /gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages (from ncbitax2lin) (3.7.4.2)
Collecting fire<0.4.0,>=0.3.1
  Using cached fire-0.3.1.tar.gz (81 kB)
Collecting pandas<2.0.0,>=1.0.3
  Downloading pandas-1.0.3-cp37-cp37m-manylinux1_x86_64.whl (10.0 MB)
     |█▎                              | 409 kB 4.5 kB/s eta 0:35:18ERROR: Exception:
Traceback (most recent call last):
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_vendor/urllib3/contrib/pyopenssl.py", line 313, in recv_into
    return self.connection.recv_into(*args, **kwargs)
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1822, in recv_into
    self._raise_ssl_error(self._ssl, result)
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1622, in _raise_ssl_error
    raise WantReadError()
OpenSSL.SSL.WantReadError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_vendor/urllib3/response.py", line 425, in _error_catcher
    yield
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_vendor/urllib3/response.py", line 507, in read
    data = self._fp.read(amt) if not fp_closed else b""
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_vendor/cachecontrol/filewrapper.py", line 62, in read
    data = self.__fp.read(amt)
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/http/client.py", line 447, in read
    n = self.readinto(b)
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/http/client.py", line 491, in readinto
    n = self.fp.readinto(b)
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_vendor/urllib3/contrib/pyopenssl.py", line 326, in recv_into
    raise timeout("The read operation timed out")
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 188, in _main
    status = self.run(options, args)
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 185, in wrapper
    return func(self, options, args)
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 333, in run
    reqs, check_supported_wheels=not options.target_dir
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_internal/resolution/legacy/resolver.py", line 179, in resolve
    discovered_reqs.extend(self._resolve_one(requirement_set, req))
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_internal/resolution/legacy/resolver.py", line 362, in _resolve_one
    abstract_dist = self._get_abstract_dist_for(req_to_install)
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_internal/resolution/legacy/resolver.py", line 314, in _get_abstract_dist_for
    abstract_dist = self.preparer.prepare_linked_requirement(req)
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 412, in prepare_linked_requirement
    hashes=hashes,
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 198, in unpack_url
    hashes=hashes,
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 124, in get_http_url
    link, downloader, temp_dir.path, hashes
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 220, in _download_http_url
    for chunk in download.chunks:
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_internal/cli/progress_bars.py", line 166, in iter
    for x in it:
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_internal/network/utils.py", line 39, in response_chunks
    decode_content=False,
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_vendor/urllib3/response.py", line 564, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_vendor/urllib3/response.py", line 529, in read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/gss1/home/zsm20181019/anaconda3/lib/python3.7/site-packages/pip/_vendor/urllib3/response.py", line 430, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.

Could not find a version that satisfies the requirement

Hii! I am trying to install this package, but I get error: Could not find a version that satisfies the requirement
I am using PIP 21.3.1
Could you please help me with the latest version?
thanks,
Parth

Does this still work with current taxonomy dump?

Hi,

Thanks for providing this very useful script! Unfortunately I haven't been able to run ncbitax2lin on the latest taxonomy dump in ncbi. I don't know if NCBI have change the format? It looks like the number of columns isn't what the script expects (see stack trace below). Please could you verify whether it works on the current nodes.dmp and names.dmp in ncbi, and if it does, please would you be able to save a current lineage file version and share via gitlab (latest one there is from 2019, which doesn't contain SARS-CoV-2). Many thanks!

Traceback (most recent call last):
  File "/users/fraser/golubchi/.local/bin/ncbitax2lin", line 10, in <module>
    sys.exit(main())
  File "/users/fraser/golubchi/.local/lib/python3.7/site-packages/ncbitax2lin/ncbitax2lin.py", line 192, in main
    fire.Fire(taxonomy_to_lineages)
  File "/users/fraser/golubchi/.local/lib/python3.7/site-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/users/fraser/golubchi/.local/lib/python3.7/site-packages/fire/core.py", line 468, in _Fire
    target=component.__name__)
  File "/users/fraser/golubchi/.local/lib/python3.7/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/users/fraser/golubchi/.local/lib/python3.7/site-packages/ncbitax2lin/ncbitax2lin.py", line 171, in taxonomy_to_lineages
    df_data = data_io.read_names_and_nodes(names_file, nodes_file)
  File "/users/fraser/golubchi/.local/lib/python3.7/site-packages/ncbitax2lin/data_io.py", line 77, in read_names_and_nodes
    nodes_df = load_nodes(nodes_file)
  File "/users/fraser/golubchi/.local/lib/python3.7/site-packages/ncbitax2lin/utils.py", line 23, in timed_func
    result = func(*args, **kwargs)
  File "/users/fraser/golubchi/.local/lib/python3.7/site-packages/ncbitax2lin/data_io.py", line 38, in load_nodes
    "comments",
  File "/users/fraser/golubchi/.local/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/users/fraser/golubchi/.local/lib/python3.7/site-packages/pandas/io/parsers.py", line 454, in _read
    data = parser.read(nrows)
  File "/users/fraser/golubchi/.local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1133, in read
    ret = self._engine.read(nrows)
  File "/users/fraser/golubchi/.local/lib/python3.7/site-packages/pandas/io/parsers.py", line 2037, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 952, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 1028, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas/_libs/parsers.pyx", line 1338, in pandas._libs.parsers.TextReader._get_column_name
IndexError: list index out of range

Error in installation

Dear Zyxue,
I am trying to install the program but getting the following error message. My python version is 3.10.2.

Keyring is skipped due to an exception: module 'collections' has no attribute 'MutableMapping'
Defaulting to user installation because normal site-packages is not writeable
Collecting ncbitax2lin
  Using cached ncbitax2lin-2.2.0-py3-none-any.whl (10 kB)
Collecting pandas<2.0.0,>=1.0.3
  Using cached pandas-1.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB)
Collecting typing-extensions<4.0.0,>=3.7.4
  Using cached typing_extensions-3.10.0.2-py3-none-any.whl (26 kB)
Collecting fire<0.4.0,>=0.3.1
  Using cached fire-0.3.1.tar.gz (81 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [20 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 14, in <module>
        File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 12, in <module>
          import setuptools.version
        File "/usr/lib/python3/dist-packages/setuptools/version.py", line 1, in <module>
          import pkg_resources
        File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 77, in <module>
          __import__('pkg_resources.extern.packaging.requirements')
        File "/usr/lib/python3/dist-packages/pkg_resources/_vendor/packaging/requirements.py", line 9, in <module>
          from pkg_resources.extern.pyparsing import stringStart, stringEnd, originalTextFor, ParseException
        File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
        File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
        File "<frozen importlib._bootstrap>", line 672, in _load_unlocked
        File "<frozen importlib._bootstrap>", line 632, in _load_backward_compatible
        File "/usr/lib/python3/dist-packages/pkg_resources/extern/__init__.py", line 43, in load_module
          __import__(extant)
        File "/usr/lib/python3/dist-packages/pkg_resources/_vendor/pyparsing.py", line 943, in <module>
          collections.MutableMapping.register(ParseResults)
      AttributeError: module 'collections' has no attribute 'MutableMapping'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

pip install error

pip install -U ncbitax2lin
ERROR: Could not find a version that satisfies the requirement ncbitax2lin (from versions: none)
ERROR: No matching distribution found for ncbitax2lin

KeyError: 1

Im running the scripts as instructed in anaconda and I get this error. Im not good enough in python to figure out the problem. Can you help?

Console feed:

(base) C:\Users\BPIL>ncbitax2lin --nodes-file taxdump/nodes.dmp --names-file taxdump/names.dmp
2021-08-31 13:32:38,637|INFO|time spent on load_nodes: 0:00:04.046432
2021-08-31 13:32:45,796|INFO|time spent on load_names: 0:00:07.158943
2021-08-31 13:32:48,974|INFO|# of tax ids: 2,359,686
2021-08-31 13:32:49,420|INFO|df.info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2359686 entries, 0 to 2359685
Data columns (total 4 columns):
 #   Column         Dtype
---  ------         -----
 0   tax_id         int64
 1   parent_tax_id  int64
 2   rank           object
 3   rank_name      object
dtypes: int64(2), object(2)
memory usage: 367.0 MB

2021-08-31 13:32:49,421|INFO|Generating TAXONOMY_DICT ...
2021-08-31 13:33:00,737|INFO|found 12 cpus, and will use all of them to find lineages for all tax ids
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "c:\users\bpil\anaconda3\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "c:\users\bpil\anaconda3\lib\multiprocessing\pool.py", line 44, in mapstar
    return list(map(*args))
  File "c:\users\bpil\anaconda3\lib\site-packages\ncbitax2lin\ncbitax2lin.py", line 78, in find_lineage
    record = TAXONOMY_DICT[tax_id]
KeyError: 1
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\users\bpil\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\bpil\anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\BPIL\Anaconda3\Scripts\ncbitax2lin.exe\__main__.py", line 7, in <module>
  File "c:\users\bpil\anaconda3\lib\site-packages\ncbitax2lin\ncbitax2lin.py", line 192, in main
    fire.Fire(taxonomy_to_lineages)
  File "c:\users\bpil\anaconda3\lib\site-packages\fire\core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "c:\users\bpil\anaconda3\lib\site-packages\fire\core.py", line 468, in _Fire
    target=component.__name__)
  File "c:\users\bpil\anaconda3\lib\site-packages\fire\core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "c:\users\bpil\anaconda3\lib\site-packages\ncbitax2lin\ncbitax2lin.py", line 179, in taxonomy_to_lineages
    lineages = find_all_lineages(df_data.tax_id)
  File "c:\users\bpil\anaconda3\lib\site-packages\ncbitax2lin\ncbitax2lin.py", line 101, in find_all_lineages
    return pool.map(find_lineage, tax_ids)
  File "c:\users\bpil\anaconda3\lib\multiprocessing\pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "c:\users\bpil\anaconda3\lib\multiprocessing\pool.py", line 657, in get
    raise self._value
KeyError: 1

Arrange lineages according to taxonomic hierarchy

Dear zyxue,
I really like the software you created, I find it really useful to perform comparative analyses between local BLAST searches and the NCBI Taxonomy. I wanted to compare different organisms (let's say Drosophila melanogaster and Arabidopsis thaliana) at each taxonomic rank. Obviously, doing this is not straightforward, as different species have a different amount of taxonomic levels in the NCBI (e.g., Drosophila melanogaster has 34 taxonomic levels ranging from the species-level to "cellular organisms", while Arabidopsis thaliana has only 21 levels). I figured that I could manually arrange the entire "ncbi_lineages" table according to the species with the highest number of taxonomic levels, which would also arrange the other species taxonomic hierarchies, and later removing the blank spaces for the species with the missing levels. Then I noticed that the "no rank#" and "clade#" columns do not correspond to the same taxonomic level in different organisms (e.g., the "clade" column corresponds to "Opisthokonta" in Drosophila, but corresponds to "Embryophyta" in Arabidopsis, which are not taxonomically equivalent). Is there a way for ncbitax2lin to arrange these problematic columns in their correct hierarchical level? Or for the table to be arranged according to the correct taxonomic hierarchy of an specific species without having to do it manually for each species of interest?
Best regards,
Josué.

Can this work in a python script (without the need for it to run in command-line)?

I'm working on metagenomics and I have a dataset comprised of various identifications performed by different softwares.
The dataset has this structure:
Assignment | TaxID | Number_of_reads

I was adding the taxonomy using NCBITaxa to get the full taxonomy (from kingdom to genus/species) but its not working properly for some reason and I've seen your script does the exact same but its a command-line executable.

I wanted to ask whether or not you could either help me either by making it work in python.

Could not get any output

Hello,

When I run the command;
ncbitax2lin --nodes-file taxdump/nodes.dmp --names-file taxdump/names.dmp ../ncbi.tax.10.txt

It does not produce any output. the log messages were;

2022-07-14 12:45:57,883|INFO|time spent on load_nodes: 0:00:04.126396
2022-07-14 12:46:05,872|INFO|time spent on load_names: 0:00:07.987480
2022-07-14 12:46:08,622|INFO|# of tax ids: 2,431,352
2022-07-14 12:46:09,087|INFO|df.info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2431352 entries, 0 to 2431351
Data columns (total 4 columns):
 #   Column         Dtype
---  ------         -----
 0   tax_id         int64
 1   parent_tax_id  int64
 2   rank           object
 3   rank_name      object
dtypes: int64(2), object(2)
memory usage: 378.0 MB

2022-07-14 12:46:09,087|INFO|Generating a dictionary of taxonomy: tax_id =tax_unit ...
2022-07-14 12:46:19,067|INFO|size of taxonomy_dict: ~80 MB
2022-07-14 12:46:19,126|INFO|Finding all lineages ...
2022-07-14 12:46:19,126|INFO|will use 6 processes to find lineages for all 2,431,352 tax ids
2022-07-14 12:46:19,139|INFO|chunk_size = 405226
2022-07-14 12:46:19,148|INFO|chunked sizes: [405226, 405226, 405226, 405226, 405226, 405222]
2022-07-14 12:46:19,156|INFO|Starting 6 processes ...
2022-07-14 12:46:19,715|INFO|Joining 6 processes ...
working on tax_id: 2500000
working on tax_id: 2000000
working on tax_id: 1550000
working on tax_id: 1150000
working on tax_id: 500000
working on tax_id: 50000
working on tax_id: 2050000
working on tax_id: 2550000
working on tax_id: 1600000
working on tax_id: 1200000
working on tax_id: 2100000
working on tax_id: 100000
working on tax_id: 2600000
working on tax_id: 1650000
working on tax_id: 1250000
working on tax_id: 2150000
working on tax_id: 150000
working on tax_id: 650000
working on tax_id: 1700000
working on tax_id: 1300000
working on tax_id: 2650000
working on tax_id: 700000
working on tax_id: 200000
working on tax_id: 2200000
working on tax_id: 1350000
working on tax_id: 1750000
working on tax_id: 750000
working on tax_id: 250000
working on tax_id: 1800000
working on tax_id: 1400000
working on tax_id: 2250000
working on tax_id: 2750000
working on tax_id: 850000
working on tax_id: 300000
working on tax_id: 2300000
working on tax_id: 1850000
working on tax_id: 900000
working on tax_id: 2800000
working on tax_id: 1500000
working on tax_id: 1900000
working on tax_id: 2850000
working on tax_id: 350000
working on tax_id: 2350000
working on tax_id: 400000
working on tax_id: 1000000
working on tax_id: 1950000
working on tax_id: 2900000
working on tax_id: 2400000
working on tax_id: 2950000
working on tax_id: 1050000
working on tax_id: 450000
working on tax_id: 2450000
2022-07-14 12:46:45,642|INFO|adding lineages from /tmp/tmpa3bjevds_ncbitax2lin/_lineages_0.pkl ...
2022-07-14 12:46:49,074|INFO|adding lineages from /tmp/tmpa3bjevds_ncbitax2lin/_lineages_1.pkl ...
2022-07-14 12:46:51,566|INFO|adding lineages from /tmp/tmpa3bjevds_ncbitax2lin/_lineages_2.pkl ...
2022-07-14 12:46:53,381|INFO|adding lineages from /tmp/tmpa3bjevds_ncbitax2lin/_lineages_3.pkl ...
2022-07-14 12:46:56,254|INFO|adding lineages from /tmp/tmpa3bjevds_ncbitax2lin/_lineages_4.pkl ...
2022-07-14 12:46:59,116|INFO|adding lineages from /tmp/tmpa3bjevds_ncbitax2lin/_lineages_5.pkl ...
2022-07-14 12:47:00,507|INFO|Preparings all lineages into a dataframe to be written to disk ...
Killed

Thank you

suggestion: output lineage tax id instead of full name

If it's possible to have lineage tax id instead of full name ?

taxid	kindom	phylum	class	order	family	genus	species
11138	10239	2732408	2732506	76804	11118	694002	694005
123595	10239	2732408	2732506	76804	11118	694002	694005
11138	10239	2732408	2732506	76804	11118	694002	694005
11138	10239	2732408	2732506	76804	11118	694002	694005
11128	10239	2732408	2732506	76804	11118	694002	694003
160235	10239	2732408	2732506	76804	11118	694013	694014
11120	10239	2732408	2732506	76804	11118	694013	694014
249065	10239	2732408	2732506	76804	11118	694002	694009
249069	10239	2732408	2732506	76804	11118	694002	694009
258508	10239	2732408	2732506	76804	11118	694002	694009
11120	10239	2732408	2732506	76804	11118	694013	694014
270642	10239	2732408	2732506	76804	11118	693996	277944
267385	10239	2732408	2732506	76804	11118	694002	694009
31631	10239	2732408	2732506	76804	11118	694002	694003
11120	10239	2732408	2732506	76804	11118	694013	694014
694009	10239	2732408	2732506	76804	11118	694002	694009

~\multiprocessing\pool.py", line 771, in get raise self._value: KeyError: 1

Hi,

was trying to youse your tool but got a KeyError: 1 with python multiprocessing. Any idea what could be the issue? Here is the error output:

Traceback (most recent call last):
File "c:\users\nauras\programs\python\python39\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "c:\users\nauras\programs\python\python39\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "c:\users\nauras\programs\python\python39\lib\site-packages\ncbitax2lin\ncbitax2lin.py", line 78, in find_lineage
record = TAXONOMY_DICT[tax_id]
KeyError: 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "c:\users\nauras\programs\python\python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\nauras\programs\python\python39\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\Nauras\Programs\Python\Python39\Scripts\ncbitax2lin.exe_main.py", line 7, in
File "c:\users\nauras\programs\python\python39\lib\site-packages\ncbitax2lin\ncbitax2lin.py", line 192, in main
fire.Fire(taxonomy_to_lineages)
File "c:\users\nauras\programs\python\python39\lib\site-packages\fire\core.py", line 138, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "c:\users\nauras\programs\python\python39\lib\site-packages\fire\core.py", line 463, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "c:\users\nauras\programs\python\python39\lib\site-packages\fire\core.py", line 672, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "c:\users\nauras\programs\python\python39\lib\site-packages\ncbitax2lin\ncbitax2lin.py", line 179, in taxonomy_to_lineages
lineages = find_all_lineages(df_data.tax_id)
File "c:\users\nauras\programs\python\python39\lib\site-packages\ncbitax2lin\ncbitax2lin.py", line 101, in find_all_lineages
return pool.map(find_lineage, tax_ids)
File "c:\users\nauras\programs\python\python39\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "c:\users\nauras\programs\python\python39\lib\multiprocessing\pool.py", line 771, in get
raise self._value
KeyError: 1`

Cheers,
Nauras

How to use your script?

Hi, I am a PhD student with very little coding skills, but I have a large dataset of sequences with NCBI taxIDs for which I would like to get the lineage information. I have two problems:

I made the environment as specified, but when I tried make, I got the errors

/bin/sh: md5sum: command not found
make[1]: *** [taxdump.tar.gz] Error 127
make: *** [taxdump] Error 2

Do you have any suggestions how I can fix these errors?

I am unsure how to use your code. I have a text file with just a list of the taxIDs, but I also have the output from the BLAST search with other info in it. I tried just Could you please provide an example, or offer me some suggestions of what I need to do?

Thank you!

problem

Hi Developer,
i get this error when itry to install the tool:

lfaino@LabReverberi_PT /data/software $ git clone [email protected]:zyxue/ncbitax2lin.git
Cloning into 'ncbitax2lin'...
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

I even try using sudo

can you help, please?

Cheers
Luigi

How use your script to trace protein ID based on its tax ID？

Hi, I am a PhD student and use the ncbitax2lin recently. I want to trace the proten ID with NCBI taid, I have download prot.accession2taxid and get the second column and third column (they are gene ID and tax ID), I also downloaded the RefSeq-release215.catalog, because I saw a method the two files can get the lineage information. I also tried use prot.accession2taxid and name.dmp or lineages.csv to get the information according to the FAQ. Maybe it was failed due to the format of these files?

zyxue / ncbitax2lin Goto Github PK

ncbitax2lin's People

Contributors

Stargazers

Watchers

Forkers

ncbitax2lin's Issues

Recommend Projects

Recommend Topics

Recommend Org