Comments (5)
Hey @shchur, I wasn't able to reproduce this...attaching a screenshot of a successful run with @batch
Could you perhaps:
- post the contents of
0.runtime_stderr.log
and0.runtime_stdout.log
- try with a newer metaflow version i.e.
2.10.8
since you use2.10.3
from metaflow.
Thank you for a quick response! I tried using 2.10.8
and encountered the same error. Here are the contents of s3://****-metaflows3bucket-vqcend9aqn9z/HelloFlow/1704868945391404/hello/2/
:
0.runtime_stderr.log
:
[MFLOG|0|2024-01-10T06:46:03.311728Z|runtime|91c794bc-1325-4cc5-98d7-3939daef119f] Data store error:
[MFLOG|0|2024-01-10T06:46:03.311985Z|runtime|735f3f38-9a60-4978-aefb-6f69aff0d75a] No completed attempts of the task was found for task 'HelloFlow/1704868945391404/hello/2'
[MFLOG|0|2024-01-10T06:46:03.691040Z|runtime|7f210482-078e-4217-abc9-92b1901c9e3c]
[MFLOG|0|2024-01-10T06:46:04.086122Z|runtime|e0e3f263-4ef9-404f-8196-134901d480d3]Task failed.
0.runtime_stdout.log
:
[MFLOG|0|2024-01-10T06:42:38.449017Z|runtime|dfcf4833-1cab-43fd-996e-4edefcc22977][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status RUNNABLE)...
[MFLOG|0|2024-01-10T06:43:08.649344Z|runtime|e2a6331b-df2c-42ec-b401-3c0463190c3a][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status RUNNABLE)...
[MFLOG|0|2024-01-10T06:43:38.852722Z|runtime|9ebac194-f505-491e-8a17-e45a25339148][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status RUNNABLE)...
[MFLOG|0|2024-01-10T06:44:09.018890Z|runtime|085c5b85-e151-4248-92b5-49cff2390b1f][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status RUNNABLE)...
[MFLOG|0|2024-01-10T06:44:39.160499Z|runtime|4d066689-d3e0-45a1-a9ee-6aaad1fe506d][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status RUNNABLE)...
[MFLOG|0|2024-01-10T06:45:09.384003Z|runtime|de486574-58a1-4c38-b52f-326f3237b4bc][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status RUNNABLE)...
[MFLOG|0|2024-01-10T06:45:39.528610Z|runtime|ee0bd5f6-3739-4242-bab8-4eedccdb964e][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status RUNNABLE)...
[MFLOG|0|2024-01-10T06:45:45.104157Z|runtime|f36f4539-ceaa-490d-a4a0-afb0ec2ec232][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status STARTING)...
[MFLOG|0|2024-01-10T06:46:00.746162Z|runtime|a324752b-a551-46e4-9db0-f347123d424c][9814be19-a252-4491-b6a7-3fe6b13bfa69] Task is starting (status FAILED)...
0.runtime
{"return_code": 1, "killed": false, "success": false}
0.attempt.json
{"time": 1704868953.193214}
I suspect that the problem lies in the AWS configuration, but I'm not sure how to get to its root cause. I used the CloudFormation template without any modifications, and stack creation finished with CREATE_COMPLETE
status.
Are there any additional log statements that I could add to the Metaflow code to get a more informative error message / understand which exact operation is failing?
from metaflow.
I found the source of the problem: my working directory included a folder called metaflow
, which crashed the metaflow command executed during env setup.
mkdir: cannot create directory ‘metaflow’: File exists
It might be helpful to check for presence of this folder before submitting the job and raise an informative error message to the user.
As for debugging jobs on AWS Batch, I was able to find the detailed log with error message in the Amazon Elastic Container Service
console under Clusters
.
My problem is solved, but I'm keeping this issue open in case you want to add an informative error message for the edge case.
Thank you for building and maintaining such an amazing framework!
from metaflow.
@shchur I wasn't able to reproduce this, can you please share your directory structure?
Mine looks the following:
and I submit the flow with python hello.py run --with batch
from metaflow.
@madhur-ob The problem was that I built my Docker image used by Metaflow in the same directory. In other words, if the metaflow
folder is present in the working directory of the Docker image, the Batch job crashes because it cannot unpack the archive.
from metaflow.
Related Issues (20)
- Make default kubernetes resources configurable HOT 2
- `python hello.py batch step --help` is broken
- Invalid CloudWatch log group prefix HOT 6
- Allow namespacing of `ArgoEvent` when published from a step
- allow uploading metaflow packages that include "dots" in the path
- Suggest replacing `pull_request_target` in branches as well as `main`
- Bug: Passing an `S3PutObject` to `s3.put_files` treated as `tuple` of key path values.
- Allow passing of `trusted-host` parameter to `@pypi` decorator.
- Unable to utilise pytest unit test for metaflow HOT 1
- Update extras_require for tracing dependencies?
- R tests on Github fail with macos-latest runner
- Argo-workflows with name > 52 characters fail if run as CronWorkflows HOT 1
- Introduce support for private registry for AWS Batch
- Write metadata about log scrubbing events
- Language-agnostic API and/or other language SDKs for the client API HOT 1
- Performance Degradation When Running PyCaret Model with n_jobs > 1 Inside Metaflow
- Question on Executing Metaflow Workflow from Python Script Without 'run' Argument HOT 2
- Resume Doesn't Resume from the Last Failed Step
- Argo Events: trigger sensors are not deleted
- Allow option to set labels on pods using `kubernetes` decorator
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metaflow.