criteo / cluster-pack Goto Github PK
View Code? Open in Web Editor NEWA library on top of either pex or conda-pack to make your Python code easily available on a cluster
License: Apache License 2.0
A library on top of either pex or conda-pack to make your Python code easily available on a cluster
License: Apache License 2.0
Hi,
We use cluster-pack in jupyter notebooks with conda environments.
An issue we've found is that when the kernel is in a different conda environment than the jupter notebook server, the uploaded zip (env_name.tar.gz
) contains the envionrment of jupyter notebook but the description file env_name.tar.json
list the correct kernel environment.
A concrete example:
jupyter notebook
installed and launched from jupyter
conda envtf
is created and its kernel is installedtf
kernelpackage_path, _ = cluster_pack.upload_env()
jupyter.tar.json
and jupyter.tar.gz
are uploadedjupyter.tar.json
lists correctly the libs in tf
conda env (not jupyter
)jupyter.tar.gz
packages actually the jupyter
conda envHi,
1- Is that possible that without installing pex or creating any virtual env, we can use pex command from some binary/package and create pex file?
2- how can we include static files present at some location/Artifactory into pex executable.
3- handling optional packages installed out of virtual env. e.g. NLTK library
Hi,
I have a use-case where python interpreter will not be available on all the nodes. Is it possible to bundle python binary along with pex and execute in environment where python interpreter is not present?? I know for those cases we may need to use freezers..
Any suggestion
Thanks,
Hi, I am not able to run the pyspark code until i bundle pyspark in .pex file.
Though in normal scenarios we set PYTHONPATH as below
PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH
PYTHONPATH=$SPARK_HOME/python/lib/pyspark.zip:$PYTHONPATH
export PYTHONPATH
spark-submit
--master yarn
--deploy-mode cluster
--conf spark.pyspark.driver.python=./my_application_saprk.pex
--conf spark.pyspark.python=./my_application_saprk.pex
--conf spark.executorEnv.PEX_ROOT=./tmp
--conf spark.yarn.appMasterEnv.PEX_ROOT=./tmp
--files my_application_saprk.pex
pyspark_pandas.py
It's not able to find pysaprk
"ModuleNotFoundError: No module named 'pyspark' "
Can any one of you please help here.
-Thanks
I have a requirement where .egg file is provide as all python libs, due to security reason.
but looks like pex > 2.0 doesn't support picking up .egg files while bundling.
Is there any option to achieve it to bundle .egg file from a local directory??
Hi,
I think it would be nice to use semantic versioning for releases versions.
https://semver.org/
It means selecting version numbers as MAJOR.MINOR.PATCH
Major version should be incremented when a breaking change is done
Minor version when a feature is added
Patch when a fix is done
This is very useful for dependents to know how much changes were introduced in a new version.
This works best when major is a version >= 1 so I advise to always start at 1.0.0
Making breaking change is something that is useful to advertise for dependents and increasing the major version definitely make sense in such cases. Being at major version 5 is ok if there was a few refactoring that changed the API a lot.
I believe this versionning scheme is better than the "versionning as marketing" that is adopted by some software which consists in saying "our new major version is a big milestone, look at all the new features" which for dependents is not very useful.
In the case of cluster pack it would be releasing a 1.0.0 then following the scheme when releasing.
What do you think ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.