Comments (6)
Note that the instructions for using a packed conda environment with Spark/PySpark, at https://conda.github.io/conda-pack/spark.html , do not mention the need to run conda-unpack
on the nodes.
from conda-pack.
conda-unpack
only needs to be run if there are absolute paths embedded in libraries that need to be resolved before use. This is rare, especially when using from Python. Also, in common configuration the directory YARN localizes (unpacks) the archive to will be read-only to the user - so conda-unpack
would fail. Many users have distributed conda environments with conda-pack
to run workloads on YARN (both with spark and with dask) with no need to run conda-unpack
.
Is there a specific library that requires an absolute path that is causing you problems here? Or is this something we should clarify in our docs?
from conda-pack.
Hi @jcrist , after reading the docs, in trying to understand what kind of packages were likely to need path fix-up, I speculated that most pure python packages would be okay to run without conda-unpack
, but that complex packages like tensorflow
, which link to various C libraries and access hardware drivers, were likely to need path fix-up.
More concretely, I looked in the conda-unpack
script for a relatively simple environment with just the packages needed for PySpark Unit Tests to run (numpy
, pandas
, pyarrow
, and scipy
, with python2.7), and there were more than 400 lines of fix-ups, which I supposed were important to do. In a more complex environment adding (tensorflow
, tensorflow-hub
, scikit-learn
, psycopg2
, pytorch-cpu
and cython
, with python3.6), there were 900+ lines of fix-ups.
That said, I have not done the extensive testing that would be needed to actually find and prove specific problems resulting from NOT running conda-unpack
.
If you say we don't need to run conda-unpack
for these common AI libraries, I'll take your word for it. But I would like to understand why these hundreds of fix-ups are okay to ignore, and yes it would be good to expand the docs about what categories of packages typically have absolute paths and therefore need conda-unpack
. Thanks.
from conda-pack.
Often absolute paths are embedded in binary files for stacktraces only, and don't need to be rewritten for the library to work properly. I've only come across one library so far that required running conda-unpack
to function properly (clear
installed as part of ncurses
), and this would never be used by users as part of a dask/spark job. I can't give you a definite list of "these libraries are ok, these ones are not" because all libraries are different, but in my experience most libraries (numpy, pandas, scipy, scikit-learn, pyarrow, tensorflow, etc...) all work fine as is.
yes it would be good to expand the docs about what categories of packages typically have absolute paths and therefore need conda-unpack. Thanks.
Sure. If you have time to submit a PR adding language to the spark docs I'd happily merge it. I'm unlikely to get to this in the near future.
from conda-pack.
Truly appreciate the explanation. I will propose a PR for doc change, adding your info and a section to list packages known to need path fix-up (with an invitation to add to the list when encountered). Give me a couple days, as I'm finishing some other work :-)
from conda-pack.
Hi there, thank you for your contribution!
This issue has been automatically locked because it has not had recent activity after being closed.
Please open a new issue if needed.
Thanks!
from conda-pack.
Related Issues (20)
- THANK YOU
- Wrong libcurl CA info after packing/unpacking environment HOT 2
- Python crashes in unpacked environment on osx-arm64 HOT 4
- qt.conf keeps the absolute path HOT 1
- permission error in conda-unpack on windows 10 HOT 5
- CondaPackError: Files managed by conda were found to have been deleted/overwritten in the following packages HOT 2
- Conda pack does not work with Python 3.10 or 3.11 HOT 13
- `conda-unpack` cleanup prefixes not working HOT 1
- R/etc/ldpaths: No such file or directory HOT 3
- Permission error when packing to squashfs file on Linux HOT 2
- Lost my maintainer rights HOT 6
- Is it needed to run `conda-unpack` with squashfs ? HOT 2
- Deactivate script is deprecated and not available on mamba environments HOT 1
- Fix environment clobbered by pip by repairing conda-meta HOT 1
- Add support for alternative compressors for mksquashfs HOT 1
- [Governance] Access to Settings in this repo HOT 2
- conda-pack documentation is outdated
- Support for incremental migration of conda environments
- Conda-Managed Files Deleted or Overwritten HOT 2
- Absorb in `constructor` HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from conda-pack.