orchest / orchest Goto Github PK
View Code? Open in Web Editor NEWBuild data pipelines, the easy way 🛠️
Home Page: https://orchest.readthedocs.io/en/stable/
License: Apache License 2.0
Build data pipelines, the easy way 🛠️
Home Page: https://orchest.readthedocs.io/en/stable/
License: Apache License 2.0
In the pipeline editor we would ideally be able to select multiple edges/connections between pipeline step. This should work similarly as to how you can press Control and click on multiple pipeline steps to select them.
This makes it easier to delete multiple connections at the same time whilst making the editing experience more consistent between connections and steps.
Great project for me, thx!
I have a little idea in the actual use:
Looking forward to your reply.
As fan and user of the language, and having seen this question already mentioned on HN it would be great to know what it would take to enable support for Julia notebooks - an environment which typically plugs right in with Python libraries. My impression right now is that we would need to add a base kernel image, similarly to R support.
Currently, pipeline steps can be deleted using Delete and connections between the steps can be deleted using Backspace.
Both Delete and Backspace should work on pipeline steps as well as connections.
Hi Team,
Thanks for this great idea, I'm loving it. However, for local development, I don't see a technological reason to rely on Docker. Many of our data scientists do not use docker (for all different reasons), and there is no good argument to force them to use it.
Looking at the source quickly, orchest should be able to run without a container. Furthermore, there are obvious pain-points with docker (e.g., #56).
Could you please make the docker dependency optional?
Many thanks!
The configurations of IDEs we integrate in Orchest:
should be persisted so that a user does not have to configure their IDEs every time. Otherwise (as it is currently), for example added extensions to JupyterLab would have to be reinstalled every time JupyterLab is started.
This feature should be easy to use and so purely defining the configuration programmatically (think dotfiles) is probably not the way to go. Additionally, the configurations should be portable between upgrades of Orchest (and possibly upgrades of the specific IDE service containers).
The current idea is to mount a new directory (from userdir/.orchest/...
) to the appropriate location in the IDE service container to persist the configurations. For JupyterLab we have to make sure this does not require a rebuild and includes extensions (the jupyter lab clean
command suggests this approach is possible since it allows for --extensions
, --settings
and --static
flags).
Notifications should be made optional.
Without notifications the user always needs to check the pipeline or build in order to know whether or not it is done building.
Notifications can be browser based (when inside the application) but possible also via integrations such as email or Telegram.
This issue keeps track of the integration of the code-server browser based VS Code editor.
As per @howie6879's recommendation we'll look into how and whether it makes sense to expand the available editors beyond JupyterLab.
At the moment VS Code can be used with Orchest by opening the orchest/userdir/
directory in VS Code directly on the host on which Orchest is installed (or through SSH if it's on a remote server).
In the future we'd like to support more advanced multi-node use cases by building on top of existing container orchestration abstractions. This issue will track progress on this particular feature and the decisions that are made around it.
I.e. when renaming a folder in the JLab file manager (PATCH request).
Build hash to avoid webserver javascript caching on rebuilds.
A more detailed report about the behaviour can be found at docker/for-linux#1034
In short: whenever a Docker operation (i.e. start or shutting down a container) occurs the ongoing HTTP requests in Firefox fail (tested on Linux).
We'll await Docker's response to this issue as we think it's not something we can directly address ourselves as it seems a more generic Docker + Linux bug.
Hi there:
Sometimes I delete a project by using filemanager
and then import it again.
This causes the orchest to fail to load the project, The operation process is as follows:
cd orchest/userdir
git clone https://github.com/orchest/quickstart
orchest will load this project automatically, The next web operation is as follows:
Finally import quickstart
project by using Git again.
At this point, orchest can't load the quickstart
project.
You can check the screenshot that I provided for details:
After opening a pipeline you can go to its settings, here you will see things like "Pipeline name" and a section called "Memory server". We want to add an additional option to this setting that enables eviction.
This is done by adding the auto-eviction
option to the top-level settings
section in the pipeline.json
file of that specific pipeline.
{
"name": "pipeline-name",
...
"settings": {
"auto-eviction": true
}
...
}
For now we are applying the proposed patch to Jupyterlab directly until the referenced PR is merged.
Use
{
...
"total_number_of_pipeline_runs": int,
"completed_pipeline_runs": int,
...
}
that get returned by GET /experiments/<experiment_uuid>
to set the status of an experiment in the front-end (inside the table you see after clicking "Experiments" in the left-pane menu). It should be something like "5/9" meaning 5 out of 9 pipeline runs of the experiment have completed.
Ability to define pipeline level parameters, besides the already implemented step parameters.
Implementation details:
"parameters"
to the pipeline definition.If a step has no incoming steps, then pressing "run incoming steps" will not execute anything but the orchest-api will still be called. It would be better, if the button is not even shown to the user.
@ricklamers What do you think? We first wanted to add a client side warning, but I felt this is actually out of place since clicking away the warning takes the user more time than executing the "empty" pipeline run.
When working interactively you might want to see for which step the data is actively stored in the memory-server. Although on the other hand you also don't see what state is still active after running cells in a Jupyter Notebook.
In addition, the fact that it shows “completed” as a status of a step can already give enough of an indication whether or not the data is in the store.
@ricklamers @fruttasecca Thoughts?
As can be read on stackoverflow Celery will keep a list of "revoked" tasks in memory. Therefore a reboot of the container would reschedule Celery tasks due to RabbitMQ persistence (RabbitMQ persistence is implemented in PR #8 ).
noirbizarre/flask-restplus#777
For now we are using the proposed temporary fix: Werkzeug==0.16.1
JupyterLab caches its layout, which blocks the right file from opening if JupyterLab wasn't loaded in the iframe already.
Try to use nbconvert to catch server messages sent to kernel client and pipe those to container output.
Enterprise Gateway environment variable EG_KERNEL_WHITELIST
will possibly no longer include brackets in new release
When trying to create new users from Settings, clicking the button does not work as expected, as no request is sent to the auth service. Sending the POST request using curl works, as I've been able to create some user accounts this way.
The same thing happens at the Login page: clicking the Login button does not send any request to the auth service.
Note: I'm running Orchest with SSL enabled.
We might want to provide language API’s (through the orchest-sdk
) to interact with the pipeline definition.
pipeline.add_step(“new-title”, “filename.py”)
Not sure if we ever want to do this and how exactly. But it’s basically pipelines as code instead of as JSON files or visually editable.
(Data science) Languages such as:
The idea is to use language native interfaces to invoke Python, alleviating the development to rewrite the SDK from Python to other languages.
For reference:
Even when it isn't when clicked during launch of pipeline.
This is used for data transfer.
Environment variables will replace the current Data sources:
The current idea around adding environment variables to Orchest is as follows:
/data
concept is kept. And so host system file mounting is no longer supported (symlinks from the /data
directory won't work due to Docker), instead you need to directly put the data you want to use in the /data
directory.Implementation details:
orchest-webserver
and persisted in the orchest-api
whenever a job is started. This is similar to how we treat parameters.If you are in the pipeline editor, you notice (when dragging the canvas or zooming out) that to the left and top there is no canvas being drawn. When a component of the pipeline is placed here it should dynamically add canvas in that position.
The idea is basically an implementation of the dynamic canvas spawning seen in https://draw.io/.
Add instructions to Docs and README for Windows to allow Docker to create folders and files.
This seems to only happen if the policies are set too strict.
Someone had this issue on Firefox (68.10) on Windows (10). The error was
Content Security Policy: « x-frame-options » ignored due to « frame-ancestors » directive.
Using Chrome fixed it, but we can try to be as well behaving as possible when it comes to content policies (we are serving from a single nginx proxy after all).
Generally speaking functionality can be shared across projects by creating a package or library and adding it as a dependency in the environments. For packages hosted in a private repository, environment variables (#124) will make it possible to supply credentials.
When it comes to making pipeline components (for example you want to share a notebook between pipelines in different projects), i.e. a script, shareable then this cannot be done using a package. Instead you could put those scripts in a git repo and use it as a git submodule in the pipelines that make use of it. Additionally we could:
/data
directory, from other steps or even from the pipeline editor. This should also aid the development process (similar to editable install in pip).Whilst thinking about this feature, it is good to keep in mind that one of the goals of Orchest is: a project (after importing) should be runnable. So all dependencies have to completely resolvable by Orchest.
@howie6879 Did I write this up correctly, or did I miss anything?
When changing the size of the memory-server
through the settings of a pipeline, the user is required to reboot the entire session and thus losing the state of all kernels.
It would be better if the user could only restart the memory-server itself for the changes to take effect. Or possibly dynamically, but don't think this is supported without losing the current objects in the plasma store.
Windows is not required to be supported as dev platform
I tried out the installation steps but hitting an error when running ./orchest install
. See detailed log below.
versions:
docker --version
Docker version 19.03.13, build 4484c46d9d
Orchest commit: 489bde8f2fe217e56e79cd55cb90d493d53006a3 (Jan 4)
Detailed logs:
Unable to find image 'orchest/orchest-ctl:latest' locally
latest: Pulling from orchest/orchest-ctl
6ec7b7d162b2: Already exists
80ff6536d04b: Pull complete
6c51d3836e95: Pull complete
6ce84404158b: Pull complete
6e001f327b45: Pull complete
31686f95ea4e: Pull complete
c6c989f83870: Pull complete
936cc2d383ad: Pull complete
Digest: sha256:e22ee169ea6709e29839a865cbd6ffc3f6d5e8390b1f94fb85edc3e920f888c4
Status: Downloaded newer image for orchest/orchest-ctl:latest
Installation might take some time depending on your network bandwidth. Starting installation...
Pulling images: 14/14|#############################################################################|
Orchest sends anonymized telemetry to analytics.orchest.io. To disable it, please refer to:
https://orchest.readthedocs.io/en/stable/user_guide/other.html#configuration
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/docker/api/client.py", line 268, in _raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.40/networks/orchest
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/app/utils.py", line 156, in install_network
docker_client.networks.get(config.DOCKER_NETWORK)
File "/usr/local/lib/python3.7/site-packages/docker/models/networks.py", line 182, in get
self.client.api.inspect_network(network_id, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/docker/utils/decorators.py", line 19, in wrapped
return f(self, resource_id, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/docker/api/network.py", line 213, in inspect_network
return self._result(res, json=True)
File "/usr/local/lib/python3.7/site-packages/docker/api/client.py", line 274, in _result
self._raise_for_status(response)
File "/usr/local/lib/python3.7/site-packages/docker/api/client.py", line 270, in _raise_for_status
raise create_api_error_from_http_exception(e)
File "/usr/local/lib/python3.7/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation)
docker.errors.NotFound: 404 Client Error for http+docker://localhost/v1.40/networks/orchest: Not Found ("network orchest not found")
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/docker/api/client.py", line 268, in _raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localhost/v1.40/networks/create
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/orchest", line 8, in <module>
sys.exit(__entrypoint())
File "/usr/local/lib/python3.7/site-packages/app/main.py", line 59, in __entrypoint
app()
File "/usr/local/lib/python3.7/site-packages/typer/main.py", line 214, in __call__
return get_command(self)(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/typer/main.py", line 497, in wrapper
return callback(**use_params) # type: ignore
File "/usr/local/lib/python3.7/site-packages/app/main.py", line 124, in install
cmdline.install(lang)
File "/usr/local/lib/python3.7/site-packages/app/cmdline.py", line 64, in install
utils.install_network()
File "/usr/local/lib/python3.7/site-packages/app/utils.py", line 173, in install_network
config.DOCKER_NETWORK, driver="bridge", ipam=ipam_config
File "/usr/local/lib/python3.7/site-packages/docker/models/networks.py", line 156, in create
resp = self.client.api.create_network(name, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/docker/api/network.py", line 153, in create_network
return self._result(res, json=True)
File "/usr/local/lib/python3.7/site-packages/docker/api/client.py", line 274, in _result
File "/usr/local/lib/python3.7/site-packages/docker/api/client.py", line 270, in _raise_for_status
raise create_api_error_from_http_exception(e)
File "/usr/local/lib/python3.7/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 500 Server Error for http+docker://localhost/v1.40/networks/create: Internal Server Error ("failed to update bridge store for object
type *bridge.networkConfiguration: open /var/lib/docker/network/files/local-kv.db: read-only file system")
ERRO[0224] Error waiting for container: container 01b7da9a8884603afad1b15cd954e52866596226fa27c0eb41ceeedb022bf588: driver "btrfs" failed to remove root file
system: Failed to destroy btrfs snapshot /var/lib/docker/btrfs/subvolumes for 213212e826d1f2079d242a75aa60c79d7bbfe7a3ac1b67e0e641d532f80434b2: read-only file system
Shutting a session down is reasonably slow due to the graceful shutdown of the dockerized Jupyter kernels. However, if we could kill all session related containers directly, then the speed should be increased.
As a consequence rebooting is faster as well.
Without SSH support, the user will always have to manually enter their username and password. Another possibility would be to do versioning not through Orchest, but if Orchest is installed on a cloud instance this is not ideal.
We need to discuss how this should work in a multi-user context.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.