Comments (3)
We should at least stream the logs to files and tell user to tail -F path/to/logfile
if they're curious about progress. When I try it now I have to stare at a blank screen for 10 minutes with no progress:
I 11-16 19:37:15 cloud_vm_ray_backend.py:210] Launching on AWS us-west-2 (us-west-2a,us-west-2b,us-west-2c,us-west-2d).
I 11-16 19:37:15 cloud_vm_ray_backend.py:213] If this takes longer than ~30 seconds, provisioning is likely successful. Setting up may take a few minutes.
I 11-16 19:37:15 backend_utils.py:38] Created or updated file config/aws-ray.yml
from skypilot.
Yes, expected to not show, although we can change. The rationale is if you print ray up's stdout/stderr, and if you have retries, that will be very crowded. Give it a try by setting a p3.16x requirement.
We could try to write the stdout/stderr (captured in variables) to files. This is similar to V1's logging.
from skypilot.
+1 to stream to a file and hint on tail -f.
from skypilot.
Related Issues (20)
- AWS Serving models insufficient permissions of skypilot role HOT 2
- [Controller] Supporting multiple controllers when ClusterOwnerIdentity changes HOT 1
- [k8s] Ingress paths for exposing ports need to be namespaced
- [tests] Allow custom ~/.sky/config.yaml for tests HOT 1
- RunPod skypilot does not allow stopping instances HOT 2
- Runpod cluster created with wrong number of accelerators HOT 2
- Spot instances not supported for runpod HOT 1
- [cudo] Unable to setup credentials on cudo HOT 1
- [Forward compat] Clearly surface `older client -> newer cluster` error
- [Spot/UX] Make spot job name part of `SKYPILOT_TASK_ID`
- [UX/GCP] Explicit error when GCP reauth is set
- [Doc] Reorganize multiple candidate resources page
- [AWS] Bucket on eu-south-1 fail to copy/mount
- Spot Training Controller Failed - vicuna-llama-2 HOT 5
- Cloud 'lambda_cloud' is not a valid cloud HOT 1
- [k8s] Investigate and document `podPidsLimit` kubelet arg
- [Observability] Expose new env vars for: cloud, region, cluster name
- [GCP] Add support for the Dynamic Workload Scheduler (GCE) HOT 1
- [Paperspace] Bug in stopping instance
- [Observability] Expose Prometheus Metrics (Spot Controller) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from skypilot.