netflix / metaflow Goto Github PK
View Code? Open in Web Editor NEW:rocket: Build and manage real-life ML, AI, and data science projects with ease!
Home Page: https://metaflow.org
License: Apache License 2.0
:rocket: Build and manage real-life ML, AI, and data science projects with ease!
Home Page: https://metaflow.org
License: Apache License 2.0
First things first, thanks to open this project!! We are working in a very similar project, and Metaflow is going to help us a lot.
Do you plan to create a kind of integration with Kubeflow Pipelines? For us, this is very helpful to deploy these pipelines in our production environment.
Is is possible to use a local HPC or GPU cluster? I understand that it works perfectly with AWS but what about when the use of AWS is not possible but there are other resources available? Can it be configured to use other resources?
Thanks,
oriol
Thanks for open sourcing this library. I was quite excited to take it for a spin, only to get an error "no module named 'fcntl'", and learning through #10, #23 and #46 that Windows is not supported, and there are no active plans for Windows support.
That is of course fine, but I have a few related questions.
I see #10 has a wontfix
and #46 has a help wanted
label. That begs the question; would you be open to accepting contributions that add Windows support?
Do you know what the major technical obstacles to Windows support are?
Just now I see on the Installing Metaflow page "Metaflow is available as a Python package for MacOS and Linux." Perhaps if it was followed by a more explicit "Windows is not supported.", fewer people would miss this.
Would it be suitable to place this on the roadmap, perhaps stating that there are no Netflix plans but outside contributions are welcome?
After setting s3 bucket, an error occurs:
S3 datastore operation _put_s3_object failed (Parameter validation failed:
Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:s3:[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$"). Retrying 7 more times..
This is due to in metaflow/datastore/s3.py
urlparse
can not figure out schema and netloc
correctly.
A simple workaround is replacing the following content
try:
# python2
from urlparse import urlparse
import cStringIO
BytesIO = cStringIO.StringIO
except:
# python3
from urllib.parse import urlparse as offcial_urlparse
import io
BytesIO = io.BytesIO
with
try:
# python2
from urlparse import urlparse as offcial_urlparse
import cStringIO
BytesIO = cStringIO.StringIO
except:
# python3
import io
from urllib.parse import urlparse as offcial_urlparse
BytesIO = io.BytesIO
# modified by Kevin
# 07/12/2019
def urlparse(path):
return offcial_urlparse(path if path.startswith('s3:') else 's3://' + path)
There should be a more elegant way to fix the issue
pip install metaflow
metaflow
Traceback (most recent call last):
File "C:\Program Files\Python37\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Program Files\Python37\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\Code\Python\CT83-PC\venv\metaflow\Scripts\metaflow.exe\__main__.py", line 5, in <module>
File "d:\code\python\ct83-pc\venv\metaflow\lib\site-packages\metaflow\__init__.py", line 45, in <module>
from .event_logger import EventLogger
File "d:\code\python\ct83-pc\venv\metaflow\lib\site-packages\metaflow\event_logger.py", line 1, in <module>
from .sidecar import SidecarSubProcess
File "d:\code\python\ct83-pc\venv\metaflow\lib\site-packages\metaflow\sidecar.py", line 4, in <module>
import fcntl
ModuleNotFoundError: No module named 'fcntl'
The module fctnl is not available on Windows systems, which makes it impossible to run metaflow on Windows.
I am open to suggestions. ๐ค
Update 1
This is what I ended up doing, I used Ubuntu with WSL
https://stackoverflow.com/questions/45228395/error-no-module-named-fcntl
cs01/gdbgui#18
https://stackoverflow.com/questions/1422368/fcntl-substitute-on-windows
After following the 'MovieStatsFlow' tutorial over here, after opening the jupyter lab on the provided notebook, getting 'UnicodeDecodeError' on the cell which gets the latest successful run.
Exact error: UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 320: ordinal not in range(128)
An open-source version of issue #2 -- would love to be able to have Metaflow plugins that support Airflow and Kubernetes!
We currently deploy our machine learning models to Kubernetes as restful API-wrapped microservices, then create Airflow dags to orchestrate and schedule the execution of all the model components.
Admittedly not entirely familiar with what all Metaflow offers just yet, but would love to see seamless integrations with these other awesome open-source tools!
It doesn't appear that there is support within the Metaflow framework to clean out/purge the .metaflow
directory created when running through FlowSpec
s locally. I imagine such a command might be a useful extension of the CLI given that data scientists using Metaflow might not always audit their hidden local directories.
There appears to be several cases of mutable args (e.g., dict, list) set as default values in functions or methods.
For example:
This pattern can often yield difficult to debug issues.
https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments
pylint metaflow | grep W0102
can reveal the offending locations.
Full list of sites: https://gist.github.com/mpkocher/7d2db19fcde3fc8e728c6143817fc024
Is there a way to use tqdm inside a step (especially a foreach task)?
I want to have a progress bar for each parallel task.
Currently I am only able to see progress bars when a task has already successfully finished.
I am not familiar how logging is handled in metaflow but here are some examples from tqdm that could be helpful to make the progress bar work:
https://github.com/tqdm/tqdm/blob/master/examples/parallel_bars.py
https://github.com/tqdm/tqdm/blob/master/examples/redirect_print.py
In general, it looks like messages are only printed when the task completed.
This makes the use of tqdm pointless.
Is there a way to immediately print to console?
Here is the source code for the example from the gif:
from metaflow import FlowSpec, step
class HelloFlow(FlowSpec):
"""
A flow where Metaflow prints 'Hi'.
Run this flow to validate that Metaflow is installed correctly.
"""
@step
def start(self):
"""
This is the 'start' step. All flows must have a step named 'start' that
is the first step in the flow.
"""
print("HelloFlow is starting.")
self.multi_processing = list(range(4))
self.next(self.hello, foreach="multi_processing")
@step
def hello(self):
"""
A step with parallel processing that should be monitored with tqdm.
"""
from tqdm import tqdm
from time import sleep
from random import random
interval = random() * 0.001
for _ in tqdm(range(10000)):
sleep(interval)
self.next(self.join)
@step
def join(self, inputs):
"""
Join our parallel branches and merge results.
"""
self.next(self.end)
@step
def end(self):
"""
This is the 'end' step. All flows must have an 'end' step, which is the
last step in the flow.
"""
print("HelloFlow is all done.")
if __name__ == '__main__':
HelloFlow()
What I did:
pip install metaflow
on Windows 10metaflow
Is there a non-trivial example of a flow where steps are not running directly in the FlowSpec, but in different docker container?
@step
def a(self):
# Step should be processed by a worker running "DockerImageA"
self.next(self.b)
@step
def b(self):
# Step should be processed by a worker running "DockerImageB"
self.next(self.end)
As I've been looking to the code a bit, I'm running into a lot of "naked" except catching.
E.g.,
try:
return json.loads(value)
except:
self.fail("%s is not a valid JSON object" % value, param, ctx)
There's a difference between except:
and except Exception:
.
https://docs.python.org/3/library/exceptions.html#exception-hierarchy
The list of potential issues can be obtained by pylint metaflow | grep W0702
.
In general, I would humbly suggest addressing some of the low hanging fruit from pylint as well as using a formatting tool such as black or autopep8.
self.days = [0 to NOW]
self.next(self.compute_day, foreach='days')
This will work, but if I rerun it it will recompute days 0 to NOW. I can easily "hack" that by interfacing with s3 directly to skip running day 0 if we already have the results... but that breaks local testing.
There's a few ways of doing this, params/client api/interfacing with s3 directly, but they are not super elegant.
Is this a pattern that you've discussed? Is it a valid usecase of metaflow, or do you recommend delegating this to airflow or similar?
A Metaflow Slack bot could be used to query the status of currently running runs, inspect the results of past runs, etc. In other words, it is, among other things, a convenient interface to Metaflow's client API.
Metaflow 2.0.1 executing DataSelectionFlow for user:neuron
Validating your flow...
The graph looks good!
Running pylint...
Pylint is happy!
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
2019-12-17 18:24:58.096 Workflow starting (run-id 21):
2019-12-17 18:24:58.831 [21/start/39 (pid 21279)] Task is starting.
2019-12-17 18:24:59.709 [21/start/39 (pid 21279)] INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
2019-12-17 18:25:00.013 [21/start/39 (pid 21279)] An error occurred (ClientException) when calling the SubmitJob operation: JobQueue arn:aws:batch:eu-west-1:<accountid>:job-queue/job-queue-<local username>-metaflow-test not found.
2019-12-17 18:25:00.723 [21/start/39 (pid 21279)] Task failed.
2019-12-17 18:25:00.723 This failed task will not be retried.
Internal error:
The end step was not successful by the end of flow.
At the same time this works just fine:
import os
import json
import boto3
client = boto3.client("batch")
queue = json.load(open(os.path.expanduser("~/.metaflowconfig/config.json"), "rb"))["METAFLOW_BATCH_JOB_QUEUE"]
client.list_jobs(jobQueue=queue)
Results in 'HTTPStatusCode': 200...
What am I doing wrong here?
My anonymized config:
{
"METAFLOW_BATCH_JOB_QUEUE": "arn:aws:batch:eu-west-1:<account id>:job-queue/job-queue-<aws username>-metaflow-test",
"METAFLOW_DATASTORE_SYSROOT_S3": "s3://<aws username>-metaflow-test-metaflows3bucket-<bucket identifier>",
"METAFLOW_DATATOOLS_SYSROOT_S3": "s3://<aws username>-metaflow-test-metaflows3bucket-<bucket identifier>/data",
"METAFLOW_DEFAULT_DATASTORE": "s3",
"METAFLOW_DEFAULT_METADATA": "service",
"METAFLOW_ECS_S3_ACCESS_IAM_ROLE": "arn:aws:iam::<account id>:role/<aws username>-metaflow-test-BatchS3TaskRole-<random identifier>",
"METAFLOW_SERVICE_INTERNAL_URL": "https://<random identifier>.execute-api.eu-west-1.amazonaws.com/api/",
"METAFLOW_SERVICE_URL": "https://<random identifier>.execute-api.eu-west-1.amazonaws.com/api/"
}
Metaflow on AWS currently requires a human-in-the-loop to execute and cannot automatically be scheduled. Metaflow could be made to work with AWS Step functions to allow the orchestration of Metaflow steps to be done by AWS.
metaflow tutorials pull
cd metaflow-tutorials
metaflow configure aws
python 05-helloaws/helloaws.py run
As soon as you see the output "Task is starting (status STARTING)..." perform a keyboard interrupt (Ctrl+C) to stop the workflow. Note: because this hello AWS example runs so quickly, it may be easier if you add a time.sleep(10)
and interrupt it during that delay
View the AWS Batch Job console and notice the Job is not terminated
Template should set the following environment variables for the metaflow service:
otherwise the auth endpoint will fail. Currently this is only being set when sandbox is set to true, however they should still be set if a user decides to take the template and run it on their own account.
Would be good to know the status of the build. @savingoyal
Trying to run the stats tutorial (02-statistics) and that fails with python3.7 (AttributeError: type object 'Callable' has no attribute '_abc_registry') while succeeding with python2.7
The current handling of interrupted flows (ie: when the user hits CTRL-C while a flow is running) has two issues:
It should be possible to:
This would both eliminate the 1 second minimum time per task and avoid (or at least mitigate) early kills.
Currently the only automated way to spin up AWS infra is via https://github.com/Netflix/metaflow-tools/blob/master/aws/cloudformation/metaflow-cfn-template.yml
A lot of companies have adopted Terraform instead of CloudFormation and would be nice to get easier buy-in from other departments
I can not successfully finish the aws setup. I use CloudFormation Template here https://github.com/Netflix/metaflow-tools/tree/master/aws/cloudformation and it gives me all the resources I need to use here.
When I do metaflow configure aws
, As I understand, I need to put output resources arns there. But I notice
Please enter the job queue to use for batch: -> Queue name or ARN?
Please enter the IAM role to use for the container to get AWS S3 access -> Is it ECSJobRole
from CF outputs?
Please enter the URL for your metadata service: -> Is it the ServiceUrl
? There's another one InternalServiceUrl
.
Please enter the default container image to use -> Can not find any instruction here. I assume we at least need to have python env for the base container image?
Please enter the container registry -> should be <account_id>.dkr.ecr.us-west-2.amazonaws.com
I would suggest to improve the doc here.
https://docs.metaflow.org/metaflow-on-aws/deploy-to-aws
All,
I'm working on setting up a new DSS-8440 and am evaluating different management options. It appears that Slurm is best for job scheduling. Does metaflow support or have any integration with Slurm? Alternatively, are there any tips for handling machines like this?
Thank!
I follow the guidance to setup metaflow on aws, but METAFLOW_SERVICE_URL
is not part of the configuration flow, when I check ./metaflowconfig/config.json
, it only has METADATA_SERVICE_URL
. Seems the step in the flow set variable to METADATA_SERVICE_URL
but not METAFLOW_SERVICE_URL
ave you setup your AWS credentials? [y/N]: y
Do you want to use AWS S3 as your datastore? [Y/n]: Y
AWS S3
Please enter the bucket prefix to use for your flows: metaflow3-metaflows3bucket-pxxxe
Please enter the bucket prefix to use for your data [metaflow3-metaflows3bucket-pxxxe/data]:
Do you want to use AWS Batch for compute? [Y/n]: y
AWS Batch
Please enter the job queue to use for batch: arn:aws:batch:us-west-2:<account_id>:job-queue/job-queue-metaflow3
Please enter the default container image to use:
Please enter the default container image to use: continuumio/anaconda
Please enter the container registry: <account_id>.dkr.ecr.us-west-2.amazonaws.com/metaflow
Please enter the IAM role to use for the container to get AWS S3 access: arn:aws:iam::<account_id>:role/metaflow3-BatchS3TaskRole-1IXNDQD2ND1AL
Do you want to use a (remote) metadata service? [Y/n]: y
Metadata service
Please enter the URL for your metadata service: https://tgp9.execute-api.us-west-2.amazonaws.com/api/
Do you want to use conda for dependency management? [Y/n]: Y
Conda on AWS S3
Please enter the bucket prefix for storing conda packages [metaflow3-metaflows3bucket-pxxxe/conda]:
โ metaflow-tutorials python3 00-helloworld/helloworld.py run
Metaflow 2.0.0 executing HelloFlow for user:shjiaxin
Validating your flow...
The graph looks good!
Running pylint...
Pylint not found, so extra checks are disabled.
Flow failed:
Missing Metaflow Service URL. Specify with METAFLOW_SERVICE_URL environment variable
As per https://docs.metaflow.org/metaflow-on-aws/deploy-to-aws there is both optional and default fallbacks for variables. However, with metaflow configure aws you are required to fill them in instead of using defaults.
Currently, Metaflow is set up to work with AWS as the default public cloud. The architecture of Metaflow allows for additional public clouds to be supported.
Adding support for Microsoft Azure might broaden the potential user base, which could increase the adaption rate. This, in turn, could lead to increased community attention.
Metaflow is currently a Python library. Provide R bindings that would allow a Flow to be written entirely in R and use the Python library as a backend.
Metaflow tries to make the life of data scientists easier; this sometimes means providing ways to optimize certain common but expensive operations. Processing large dataframes in memory can be difficult and Metaflow could provide ways to do this more efficiently.
Usage of decorators (for example for performance tracking of flow steps) is blocked by metaflow:
class MyFlow(FlowSpec):
@step
@track_memory_usage
@track_time_usage
def start(self):
...
self.next(self.second_step)
@step
@track_memory_usage
@track_time_usage
def second_step(self):
...
Results in:
2019-12-15 10:06:12.910 [1576400770037993/start/1 (pid 10217)] Elapsed time in <function MyFlow.start at 0x7fd831923510>: 1.057846 s
2019-12-15 10:06:12.910 [1576400770037993/start/1 (pid 10217)] Memory in <function track_time_usage.<locals>.track_time_usage_wrapper at 0x7fd831923598>: 179 -> 241 MB
2019-12-15 10:06:12.916 [1576400770037993/start/1 (pid 10217)] Task finished successfully.
2019-12-15 10:06:12.930 [1576400770037993/split/2 (pid 10229)] Task is starting.
2019-12-15 10:06:17.351 [1576400770037993/split/2 (pid 10229)] <flow MyFlow step second_step> failed:
2019-12-15 10:06:17.351 [1576400770037993/split/2 (pid 10229)] Invalid self.next() transition detected on line 61:
2019-12-15 10:06:17.351 [1576400770037993/split/2 (pid 10229)] Step start specifies a self.next() transition to an unknown step, track_memory_usage_decorator_wrapper.
I am trying to run metaflow-tutorials on local mac.
after
pip install metaflow
metaflow
cd 00-helloworld
python 00-helloworld/helloworld.py show
It shows the error
Metaflow could not determine your user name based on environment variables ($USERNAME etc.)
Did't I miss some step?
thanks
First of all, thank you for open-sourcing this excellent tool!
My team uses GCP not AWS so if metaflow
could be integrated that would be great. I'm sure its on your roadmap but just putting it out there :)
Thanks a lot for open-sourcing this great library. Is it possible to provide more real-world examples of using this tool? It would be really helpful to have a real-world example that goes through a whole Data Science or Machine Learning project life cycle, such as data loading/cleaning, parameter tuning, model deployment and performance monitoring. Many thanks!
I really like the idea and structure of metaflow. For my use case, it looks like it could simultaneously solve a lot of different problems. That said, is there any way to disable versioning and archiving of specific artifacts? If I can guarantee that my upstream data source is versioned and archived appropriately, then I don't necessarily want duplication of all artifacts (because of the storage overhead).
I could just remove certain artifacts after a time, but this would require the cleaning tool to know what should and shouldn't be archived. It would be nicer if there was some syntax to declare an artifact as transient, or at the very least a call we can make at the end of a flow to dispose of artifacts that shouldn't be versioned.
Hi
How do you deal with GDPR in the internal data stores? They are versioned and stored over time in permanent storage and some are likely subject to gdpr.
metaflow.s3.get_many (and the other get* methods) will download the files to a local cache dir, but don't maintain the original directory structure.
This is fine when the task needs access to single files at a time (the path can be accessed from the resulting S3Object), but there are use cases where an internal library expects to get a subdirectory with specific structure (like shared parquet datasets)
I am currently playing around with metaflow and having problems using it in combination with tensorflow. I am trying to define, train and evaluate a model defined with the keras API in seperate steps. The program crashes at the end of the step that defines the model since metaflow tries to store the model as an artifact using pickle which is apparently not supported by tensorflow models. The error message is "TypeError: can't pickle _thread._local objects".
I do not think that this is an issue that can necessarily be fixed in metaflow when pickling is not supported in general by tensorflow models. However I was hoping that someone knows a way to use tensorflow models within metaflow and could share that knowledge.
If it helps here is some example code and the traceback produced when running it (This is using tensorflow 2.0.0):
import tensorflow as tf
from metaflow import FlowSpec, step
class ExampleFlow(FlowSpec):
"""Example of a flow using a tensorflow.keras model"""
@step
def start(self):
"""Defines a model."""
self.model = tf.keras.models.Sequential([
tf.keras.layers.Dense(4, input_shape=(4, ), activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
self.model.compile(
loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
self.next(self.end)
@step
def end(self):
"""Uses the model defined in the prior step."""
self.model.summary()
if __name__ == "__main__":
ExampleFlow()
Metaflow 2.0.0 executing ExampleFlow for user:mfr
Validating your flow...
The graph looks good!
Running pylint...
Pylint not found, so extra checks are disabled.
2019-12-11 19:28:41.009 Workflow starting (run-id 1576088921002498):
2019-12-11 19:28:41.020 [1576088921002498/start/1 (pid 7847)] Task is starting.
2019-12-11 19:28:43.109 [1576088921002498/start/1 (pid 7847)] 2019-12-11 19:28:43.109000: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-12-11 19:28:43.136 [1576088921002498/start/1 (pid 7847)] 2019-12-11 19:28:43.135689: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2096165000 Hz
2019-12-11 19:28:43.138 [1576088921002498/start/1 (pid 7847)] 2019-12-11 19:28:43.137785: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5638147fe6d0 executing computations on platform Host. Devices:
2019-12-11 19:28:43.263 [1576088921002498/start/1 (pid 7847)] 2019-12-11 19:28:43.137882: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2019-12-11 19:28:43.263 [1576088921002498/start/1 (pid 7847)] Internal error
2019-12-11 19:28:43.264 [1576088921002498/start/1 (pid 7847)] Traceback (most recent call last):
2019-12-11 19:28:43.265 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/metaflow/cli.py", line 853, in main
2019-12-11 19:28:43.265 [1576088921002498/start/1 (pid 7847)] start(auto_envvar_prefix='METAFLOW', obj=state)
2019-12-11 19:28:43.265 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/click/core.py", line 764, in call
2019-12-11 19:28:43.265 [1576088921002498/start/1 (pid 7847)] return self.main(args, kwargs)
2019-12-11 19:28:43.265 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/click/core.py", line 717, in main
2019-12-11 19:28:43.266 [1576088921002498/start/1 (pid 7847)] rv = self.invoke(ctx)
2019-12-11 19:28:43.266 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
2019-12-11 19:28:43.266 [1576088921002498/start/1 (pid 7847)] return _process_result(sub_ctx.command.invoke(sub_ctx))
2019-12-11 19:28:43.266 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/click/core.py", line 956, in invoke
2019-12-11 19:28:43.266 [1576088921002498/start/1 (pid 7847)] return ctx.invoke(self.callback, ctx.params)
2019-12-11 19:28:43.266 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/click/core.py", line 555, in invoke
2019-12-11 19:28:43.267 [1576088921002498/start/1 (pid 7847)] return callback(args, kwargs)
2019-12-11 19:28:43.749 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/click/decorators.py", line 27, in new_func
2019-12-11 19:28:43.750 [1576088921002498/start/1 (pid 7847)] return f(get_current_context().obj, args, kwargs)
2019-12-11 19:28:43.750 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/metaflow/cli.py", line 430, in step
2019-12-11 19:28:43.750 [1576088921002498/start/1 (pid 7847)] max_user_code_retries)
2019-12-11 19:28:43.750 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/metaflow/task.py", line 447, in run_step
2019-12-11 19:28:43.750 [1576088921002498/start/1 (pid 7847)] output.persist(self.flow)
2019-12-11 19:28:43.750 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/metaflow/datastore/datastore.py", line 50, in method
2019-12-11 19:28:43.751 [1576088921002498/start/1 (pid 7847)] return f(self, args, kwargs)
2019-12-11 19:28:43.751 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/metaflow/datastore/datastore.py", line 507, in persist
2019-12-11 19:28:43.751 [1576088921002498/start/1 (pid 7847)] sha, size, encoding = self._save_object(obj, var, force_v4)
2019-12-11 19:28:43.751 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/metaflow/datastore/datastore.py", line 431, in _save_object
2019-12-11 19:28:43.751 [1576088921002498/start/1 (pid 7847)] transformable_obj.transform(lambda x: pickle.dumps(x, protocol=2))
2019-12-11 19:28:43.751 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/metaflow/datastore/datastore.py", line 68, in transform
2019-12-11 19:28:43.751 [1576088921002498/start/1 (pid 7847)] temp = transformer(self._object)
2019-12-11 19:28:43.751 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/metaflow/datastore/datastore.py", line 431, in
2019-12-11 19:28:43.752 [1576088921002498/start/1 (pid 7847)] transformable_obj.transform(lambda x: pickle.dumps(x, protocol=2))
2019-12-11 19:28:43.752 [1576088921002498/start/1 (pid 7847)] TypeError: can't pickle _thread._local objects
2019-12-11 19:28:43.752 [1576088921002498/start/1 (pid 7847)]
2019-12-11 19:28:43.754 [1576088921002498/start/1 (pid 7847)] Task failed.
2019-12-11 19:28:43.754 Workflow failed.
Step failure:
Step start (task-id 1) failed.
Python environment: pyenv local with Python 3.6.8
Metaflow 2.0.0 executing PlayListFlow for user:minhtue
The next version of our playlist generator that uses the statistics
generated from 'Episode 02' to improve the title recommendations.
The flow performs the following steps:
1) Load the genre specific statistics from the MovieStatsFlow.
2) In parallel branches:
- A) Build a playlist from the top grossing films in the requested genre.
- B) Choose a random movie.
3) Join the two to create a movie playlist and display it.
Step start
Use the Metaflow client to retrieve the latest successful run from our
MovieStatsFlow and assign them as data artifacts in this flow.
=> bonus_movie, genre_movies
Step bonus_movie
This step chooses a random title for a different movie genre.
=> join
Step genre_movies
Select the top performing movies from the use specified genre.
=> join
Step join
Join our parallel branches and merge results.
=> end
Step end
Print out the playlist and bonus movie.
โ metaflow-tutorials python 03-playlist-redux/playlist.py run
Metaflow 2.0.0 executing PlayListFlow for user:minhtue
Validating your flow...
The graph looks good!
Running pylint...
Pylint not found, so extra checks are disabled.
2019-12-03 16:46:33.660 Workflow starting (run-id 1575420393649410):
2019-12-03 16:46:33.681 [1575420393649410/start/1 (pid 34515)] Task is starting.
2019-12-03 16:46:34.074 [1575420393649410/start/1 (pid 34515)] <flow PlayListFlow step start> failed:
2019-12-03 16:46:34.075 [1575420393649410/start/1 (pid 34515)] Object not found:
2019-12-03 16:46:34.075 [1575420393649410/start/1 (pid 34515)] Using metadata provider: local@/Users/minhtue/workspace/metaflow/metaflow-tutorials
2019-12-03 16:46:34.075 [1575420393649410/start/1 (pid 34515)] Flow('MovieStatsFlow') does not exist
2019-12-03 16:46:34.126 [1575420393649410/start/1 (pid 34515)]
2019-12-03 16:46:34.131 [1575420393649410/start/1 (pid 34515)] Task failed.
2019-12-03 16:46:34.131 Workflow failed.
Step failure:
Step start (task-id 1) failed.
Metaflow is unable to handle the following graph expecting the branches to converge in a single vertex.
from metaflow import FlowSpec, step
class SayHelloMetaFlow(FlowSpec):
@step
def start(self):
print('start')
self.next(self.say, self.hello, self.metaflow)
@step
def say(self):
self.shout = 'say'
self.next(self.say_hello)
@step
def hello(self):
self.shout = 'hello'
self.next(self.say_hello)
@step
def metaflow(self):
self.shout = 'metaflow'
self.next(self.say_hello_metaflow)
@step
def say_hello(self, inputs):
self.shout = f'{inputs.say.shout} {inputs.hello.shout}'
self.next(self.say_hello_metaflow)
@step
def say_hello_metaflow(self, inputs):
print(inputs.say_hello.shout, inputs.metaflow.shout)
self.next(self.end)
@step
def end(self):
print('end')
if __name__ == '__main__':
SayHelloMetaFlow()
Metaflow 2.0.0 executing SayHelloMetaFlow for user:...
Validating your flow...
The graph looks good!
Running pylint...
Pylint is happy!
Workflow starting (run-id ...):
[.../start/1 (pid ...)] Task is starting.
[.../start/1 (pid ...)] start
[.../start/1 (pid ...)] Task finished successfully.
[.../say/2 (pid ...)] Task is starting.
[.../hello/3 (pid ...)] Task is starting.
[.../metaflow/4 (pid ...)] Task is starting.
[.../say/2 (pid ...)] Task finished successfully.
[.../hello/3 (pid ...)] Task finished successfully.
[.../say_hello/5 (pid ...)] Task is starting.
[.../say_hello/5 (pid ...)] Task finished successfully.
[.../metaflow/4 (pid ...)] Task finished successfully.
[.../say_hello_metaflow/6 (pid ...)] Task is starting.
[.../say_hello_metaflow/6 (pid ...)] say hello metaflow
[.../say_hello_metaflow/6 (pid ...)] Task finished successfully.
[.../end/7 (pid ...)] Task is starting.
[.../end/7 (pid ...)] end
[.../end/7 (pid ...)] Task finished successfully.
Done!
Metaflow 2.0.0 executing SayHelloMetaFlow for user:...
Validating your flow...
Validity checker found an issue on line 25:
Step say_hello seems like a join step (it takes an extra input argument) but an incorrect number of steps (hello, say) lead to it. This join was expecting 3 incoming paths, starting from splitted step(s) say, hello, metaflow.
It would be extremely helpful for Metaflow to support DAGs in full.
Currently I am using pyenv local for my Python environment
โ metaflow-tutorials python 04-playlist-plus/playlist.py run
Metaflow 2.0.0 executing PlayListFlow for user:minhtue
Validating your flow...
The graph looks good!
Running pylint...
Pylint not found, so extra checks are disabled.
Incompatible environment:
The @conda decorator requires --environment=conda
Once Metaflow has been used to train a model, it produces artifacts that are typically persisted (for example in S3). A natural extension of this is to provide an easy mechanism to deploy web services that would take these artifacts and serve them in some way so that they can be consumed by downstream applications.
Hello metaflow,
I am interested in learning about the metaflow offering but am hitting a snag in the very first tutorial:
ฮป python 00-helloworld/helloworld.py show
Traceback (most recent call last):
File "00-helloworld/helloworld.py", line 1, in
from metaflow import FlowSpec, step
File "C:\Users\Nick.Franciose\AppData\Local\Programs\Python\Python36\lib\site-packages\metaflow_init_.py", line 45, in
from .event_logger import EventLogger
File "C:\Users\Nick.Franciose\AppData\Local\Programs\Python\Python36\lib\site-packages\metaflow\event_logger.py", line 1, in
from .sidecar import SidecarSubProcess
File "C:\Users\Nick.Franciose\AppData\Local\Programs\Python\Python36\lib\site-packages\metaflow\sidecar.py", line 4, in
import fcntl
ModuleNotFoundError: No module named 'fcntl'
Stackoverflow suggests fcntl is a linux specific. Is this offering windows compatible? If so, do you have a workaround?
Best,
NIck
Another implementation of #16
This idea is to provide metaflow with native kubernetes implementation using Argo (https://github.com/argoproj/argo) for the workflow part.
If I have a flow defined with steps
And I have a run where "Baz" fails...
And I explicitly resume my run at Baz...
...I want to know what causes Foo or Bar to rerun vs Clone, so I can re-write my code to not cause a re-run.
The current documentation says
https://docs.metaflow.org/metaflow/debugging#resuming-from-an-arbitrary-step
By default, resume resumes from the step that failed, like b above. Sometimes fixing the failed step requires re-execution of some steps that precede it.
That is pretty vague, so the user is left guessing.
It would be nice to have this when you are not using Conda, the purpose being to track the plain deps as part of the Flow.
(py3) [temp]> python -V
Python 3.5.2
(py3) [temp]> metaflow
Traceback (most recent call last):
File "/Users/.../venv/py3/bin/metaflow", line 5, in
from metaflow.main_cli import main
File "/Users/.../venv/py3/lib/python3.5/site-packages/metaflow/main_cli.py", line 243, in
@click.argument('episode', autocompletion=autocomplete_episodes)
File "/Users/.../venv/py3/lib/python3.5/site-packages/click/decorators.py", line 151, in decorator
_param_memo(f, ArgumentClass(param_decls, **attrs))
File "/Users/.../venv/py3/lib/python3.5/site-packages/click/core.py", line 1699, in init
Parameter.init(self, param_decls, required=required, **attrs)
TypeError: init() got an unexpected keyword argument 'autocompletion'
Currently, Metaflow is set up to work with AWS as the default public cloud. The architecture of Metaflow allows for additional public clouds to be supported.
Adding support for Google Cloud Platform might broaden the potential user base, which could increase the adaption rate. This, in turn, could lead to increased community attention.
Provide MetaFlow WebUi to support flow visualization.Understanding and debugging flow is increasingly important, especially for deep learning. While we have made some important first steps with visualization tools for flow, much more needs to be done to enable data scientists to understand, debug, and tune their flow and for users to trust the results.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.