Comments (8)
This would be useful to have
from wandb.
Hi @BramVanroy thank you for reporting this. May I please ask some more context, what's your current compute infra? and which ML frameworks are you mostly using?
from wandb.
Hi @BramVanroy just following up on this, to see if you could provide us with some additional information on your current multinode infrastructure so as to include those in a feature request for our engineers? thank you!
from wandb.
Hi Thanos
I am running jobs on between 1 node, 1 GPU up to 10 nodes, 4 GPUs each. It seems to me that wandb does not correctly log hardware when it comes to multi-node settings.
from wandb.
Perfect, thank you @BramVanroy for the additional context. I was wondering what's reported in those runs, if you navigate in Files
view and open the wandb-metadata.json
file, under the "cpu_count" and "cpu_count_logical" entries. Does it not detect the correct hw info when it's multinode?
from wandb.
Correct. It only reports the main node hardware configuration, but not the whole pool.
from wandb.
Great, thank you @BramVanroy for the clarification. I have logged this feature request with our engineers, and we will keep you updated here on any progress.
from wandb.
For Nvidia GPUs, this may be a good workaround on this issue: https://wandb.ai/dimaduev/dcgm/reports/Monitoring-GPU-cluster-performance-with-NVIDIA-DCGM-Exporter-and-Weights-Biases--Vmlldzo0MDYxMTA1
from wandb.
Related Issues (20)
- [Feature]: Improve "init/finish" interface using a context manager HOT 1
- [Q]wandb: Network error (ConnectTimeout), entering retry loop. HOT 1
- [Bug]: "finish" is not a known attribute of module "wandb" in 0.17.8 HOT 2
- How to use Tables to log generated text? HOT 4
- [Bug]: discrepancy in NaN values from API histories vs web HOT 2
- [Q]: wandb: WARNING Disabling the wandb service is deprecated as of version 0.18.0 and will be removed in version 0.19.0. HOT 6
- [Bug-App]: Published reports are deleted after discarding draft HOT 1
- run.histories return is empty HOT 4
- [Bug-App]: Cannot read properties of undefined (reading 'type') HOT 2
- [Bug-App]: Cannot read properties of undefined (reading 'type') when creating project HOT 2
- [Bug]: Sweep with Bayes search fails with "Need at least one searchable parameter" error HOT 4
- [Bug]: Internal server error when loading run history with api HOT 2
- [Q]: My result was drawn with two Outlines, and the data values were inconsistent with the log file records HOT 7
- [Bug]: wandbcallback xgboost integration fails for xgboost.cv HOT 2
- [Feature]: Removing the Cap on Run Comparer HOT 3
- [Bug-App]: invalid tag for offline runs HOT 1
- [Feature]: the model view feature in WandB is no longer available HOT 2
- [Q]: Why I cannot connect the wandb? with CommError: It appears that you do not have permission to access the requested resource. HOT 3
- [Bug]: save_period argument in ultralytics.YOLO.train function is ignored when checkpointing is enabled in add_wandb_callback HOT 1
- [Q]: Details on run forking HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wandb.