Comments (10)
@unrue The number of epochs should remain constant regardless of the number of GPUs used. You do not need to resize the number of epochs when scaling up the number of GPUs. However, when increasing the number of GPUs, you may observe faster convergence due to increased parallelism, potentially reducing training time. If you have any more questions or need further clarification, feel free to ask. Happy to help!
from yolov5.
@unrue hi there! Thanks for reaching out. This is a known behavior due to the overhead of synchronizing across multiple GPUs and inter-node communication. YOLOv5 and its multi-GPU capability are actively optimized, with scaling improvements ongoing. For real-time updates, please see the training best practices on our documentation. If you have further questions or feedback, feel free to let us know. 🚀
from yolov5.
Thanks Glenn,
so, in fact at the moment MultiGPU-multinode on Yolo is not useful. In the above link, I dont' see any tip to improve multigpu performances. I'm using Yolo on HPC cluster, having a lot of GPUs available. But, if Yolo does not scale up, I'm limited to run into a single node :/
I'll follow future updates. Thanks.
from yolov5.
@unrue you're welcome! I appreciate your understanding. Our team is actively working to enhance multi-GPU and multi-node performance, and we value your feedback in this process. Your support and patience mean a lot. If you have any more questions or run into any issues, feel free to ask. We're here to help!
from yolov5.
Thanks Glenn, apart time performance, are there other reason to enable Multinode in Yolo? More data processing?
from yolov5.
@unrue Absolutely, multinode setups can certainly enable larger-scale data processing and model training when dealing with massive datasets and resource-intensive tasks. This can be especially beneficial for distributed data parallel training or for handling extremely large models. Keep an eye on our updates for improvements and new features in this area. If you have any more questions, feel free to ask. Good luck with your work! 🌟
from yolov5.
Thansk Glenn, yes I have another question. Suppose Yolo starts with 4 GPUs and 50 epochs. Second test, Yolo run with 8 GPus, in such case, the number of epochs should be 25? Or the epochs remain the same? I mean, the number of epochs should be resized when the number of gpus grows up? Or it remains constant?
Thanks.
from yolov5.
Do you already have an idea why Yolo does not scale? Where is the bottleneck.
from yolov5.
@unrue The main bottleneck in scaling YOLOv5 across multiple GPUs and nodes is the communication and synchronization overhead between the GPUs. Our team is actively working to optimize and improve the scalability of YOLOv5, so keep an eye out for updates as we continue to address these challenges. Your feedback is invaluable as we work to enhance the multi-GPU and multi-node performance. If you have further questions or need assistance, feel free to ask. Thank you for your understanding and support!
from yolov5.
👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
- Docs: https://docs.ultralytics.com
- HUB: https://hub.ultralytics.com
- Community: https://community.ultralytics.com
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO 🚀 and Vision AI ⭐
from yolov5.
Related Issues (20)
- Suppress torch.hub.load() Output HOT 8
- How can I save the detections Yolov5 makes when he's working with a camera source? HOT 4
- How to specify yolov5 to train multiple folders? HOT 1
- pulling out model's layer intermediates HOT 2
- Continuous training of a Ultralytics Model HOT 4
- Exporting trained yolov5 model (trained on custom dataset) to 'saved model' format changes the no. of classes and the name of classes to default coco128 values HOT 2
- more details about training procedure HOT 4
- divide the objects into small and large categories based on the size of the bonding boxes HOT 8
- Request for YOLOv5 v6.2 Source Code under GPL-3.0 License HOT 4
- What prevents me from using the AMP function? HOT 4
- What prevents me from using the AMP function? HOT 1
- What prevents me from using the AMP function? HOT 1
- Background annotation HOT 6
- Hi @7rkMnpl, HOT 2
- Multiple GPU Hyperparameter evolution HOT 5
- Marking YOLOv5 Detection Text Outputs with TP or FP HOT 4
- Multiple threads using yolov5 model concurrent inference failed HOT 4
- Detect head structure differs HOT 4
- runs\train\exp10 is not a directory HOT 12
- Similar mAP when splitting data into train, val and test HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yolov5.