Comments (9)
Hey @arkadiusz-czerwinski , thanks for writing to W&B support. We'll investigate this on our end and get back to you soon.
from wandb.
Thank you for being so cooperative.
from wandb.
Hey @arkadiusz-czerwinski ,
- Job run will be reported as a failure if they don’t include a
wandb.init
to create a run, which I think might have happened in your case. This was another gap in our documentation--apologies. - If you look to the far right of the job run’s view under the queue, do you see an error icon (example in the video below, bottom right)? This should immediately pop an error modal. (We’ve heard from other customers who missed this, so we’re working on making it more prominent and discoverable).
launch_queue_error_modal.mov
Please let me know if this video helps locating the stack trace on your end.
from wandb.
Hi @arkadiusz-czerwinski , I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.
from wandb.
Hi. Thank you for giving me a reminder. In case of our setup, the script starts with wandb init. The error, however, states that the script failed before reaching the wandb init, although highest verbose settings were selected. @anmolmann
from wandb.
Hi @arkadiusz-czerwinski , could you share your queue config and the script as well? This will help us in reproducing this issue on our end for further investigation.
In addition to the above, launch jobs in the run queue do not always show the underlying error message especially when the job fails before init - we made progress here in the last few months as seen in the video shared above, however, I wouldn't be able to guarantee that we will extract the exact right failure message each time. We'll keep working on improvements and investigate the case you brought up, though I don't anticipate us achieving perfection in that regard.
from wandb.
Fair point. The code will be visible in this repo.
gpus: all
label:
- tutorial
volume:
- /mnt/space:/mnt/space
The main issue is that the error is not very descriptive, and the agent provides no error even after specifying verbose level.
from wandb.
Update: the issue seems to be that during creation of a job from git, entry point is required, but is then not passed forward to the wandb, which caused the error, as it had to be defined in Dockerfile.wandb.
from wandb.
I see, thanks for the context @arkadiusz-czerwinski . I will create a feature request to improve the error catching functionality for launch.
from wandb.
Related Issues (20)
- [Bug]: Clients retry 1_000_000 times by default HOT 2
- [Q]: Encountered permission denied (<Response [403]>) when calling wandb sync. HOT 4
- [Feature]: Self-Contained Working Directories HOT 1
- [Bug]: Coonection error suddenly happens HOT 3
- [Bug]: wandb.run.resumed not set correctly when using wandb core HOT 4
- [Bug-App]: Broken Pipe HOT 2
- [Bug]: `Run.log_artifact()` doesn't accept `tags` argument in v0.18.0
- [Bug-App]: Group information is not available in the plots HOT 5
- [CLI]: WARNING: No AMD GPUs specified HOT 3
- *errors.errorString: tensorboard: failed reading next event: tensorboard: unexpected CRC-32C checksum for event header
- [Bug]: wandb does not recognize GPU on Windows 11 HOT 2
- [Q]: When using deepspeed for multi-machine distributed training and setting reported_to = "wandb" in the trainer arguments, wandb will initialize multiple identical wandb runs. For example, four machines will have four identical wandb runs. How can I set it to create only one run? Should I add the wandb initialization code in the train code? HOT 2
- *errors.errorString: file transfer: upload: failed to upload: 400 Bad Request
- [Bug]: wandb-core is not compiled for my system HOT 5
- [Bug]: Setting config value triggers pyright error HOT 3
- [Bug-App]: Lower limit is undefined on grouped line plot with smoothing HOT 1
- [Q]: Multiple wandb logs on one Slurm node HOT 1
- [Bug]: WandbCoreNotAvailableError: Looks like wandb-core is not compiled for your system (Darwin-23.5.0-x86_64-i386-64bit) HOT 1
- [Feature]: API: Fetching Config Files For Multiple Runs HOT 1
- [Bug-App]: wandb.errors.WandbCoreNotAvailableError: Looks like wandb-core is not compiled for your system (macOS-10.16-x86_64-i386-64bit) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wandb.