It would be very useful if a conda environment newly deployed from a conda-pack archiv

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Allow conda-unpack to be implicit about conda-pack HOT 6 CLOSED

conda commented on July 20, 2024

Allow conda-unpack to be implicit

from conda-pack.

Comments (6)

mattf-apache commented on July 20, 2024

Note that the instructions for using a packed conda environment with Spark/PySpark, at https://conda.github.io/conda-pack/spark.html , do not mention the need to run conda-unpack on the nodes.

from conda-pack.

jcrist commented on July 20, 2024

conda-unpack only needs to be run if there are absolute paths embedded in libraries that need to be resolved before use. This is rare, especially when using from Python. Also, in common configuration the directory YARN localizes (unpacks) the archive to will be read-only to the user - so conda-unpack would fail. Many users have distributed conda environments with conda-pack to run workloads on YARN (both with spark and with dask) with no need to run conda-unpack.

Is there a specific library that requires an absolute path that is causing you problems here? Or is this something we should clarify in our docs?

from conda-pack.

mattf-apache commented on July 20, 2024

Hi @jcrist , after reading the docs, in trying to understand what kind of packages were likely to need path fix-up, I speculated that most pure python packages would be okay to run without conda-unpack, but that complex packages like tensorflow, which link to various C libraries and access hardware drivers, were likely to need path fix-up.

More concretely, I looked in the conda-unpack script for a relatively simple environment with just the packages needed for PySpark Unit Tests to run (numpy, pandas, pyarrow, and scipy, with python2.7), and there were more than 400 lines of fix-ups, which I supposed were important to do. In a more complex environment adding (tensorflow, tensorflow-hub, scikit-learn, psycopg2, pytorch-cpu and cython, with python3.6), there were 900+ lines of fix-ups.

That said, I have not done the extensive testing that would be needed to actually find and prove specific problems resulting from NOT running conda-unpack.

If you say we don't need to run conda-unpack for these common AI libraries, I'll take your word for it. But I would like to understand why these hundreds of fix-ups are okay to ignore, and yes it would be good to expand the docs about what categories of packages typically have absolute paths and therefore need conda-unpack. Thanks.

from conda-pack.

jcrist commented on July 20, 2024

Often absolute paths are embedded in binary files for stacktraces only, and don't need to be rewritten for the library to work properly. I've only come across one library so far that required running conda-unpack to function properly (clear installed as part of ncurses), and this would never be used by users as part of a dask/spark job. I can't give you a definite list of "these libraries are ok, these ones are not" because all libraries are different, but in my experience most libraries (numpy, pandas, scipy, scikit-learn, pyarrow, tensorflow, etc...) all work fine as is.

yes it would be good to expand the docs about what categories of packages typically have absolute paths and therefore need conda-unpack. Thanks.

Sure. If you have time to submit a PR adding language to the spark docs I'd happily merge it. I'm unlikely to get to this in the near future.

from conda-pack.

mattf-apache commented on July 20, 2024

Truly appreciate the explanation. I will propose a PR for doc change, adding your info and a section to list packages known to need path fix-up (with an invitation to add to the list when encountered). Give me a couple days, as I'm finishing some other work :-)

from conda-pack.

github-actions commented on July 20, 2024

Hi there, thank you for your contribution!

This issue has been automatically locked because it has not had recent activity after being closed.

Please open a new issue if needed.

Thanks!

from conda-pack.

Allow conda-unpack to be implicit about conda-pack HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent