Comments (6)
I regularly write MPI code, so this shouldn't be too complicated to implement. I've started to look though the CPU version to get started. However, I do have questions regarding the ML side.
There a few options I can see:
- Data Parallelism using MPI_Allreduce to average gradients
- I think we would do this around here:
- https://github.com/karpathy/llm.c/blob/master/train_gpt2.c#L906C1-L906C5 - Tensor parallelism (similar to lamma.cpp)
- Model Parallelism
Is there preference to how this could be scaled with MPI? If option 2 or 3, seem like the best option, do you have a suggestion as to where in the code I should dig into?
from llm.c.
Sounds great! I expect to get started with the backward pass somewhere over the weekend most likely.
(I spent today optimizing the forward pass still)
Once we have the backward pass getting data parallel training in will be super awesome
from llm.c.
The MPI version of this is mostly working at this point. I've tested it up to 8 nodes. It reduces training by many hours.
@karpathy Do you still have interest in a NCCL version? If so, are resources for multi-GPU resource that you could share?
from llm.c.
I have this in mind for the Mojo target issue - which is really about having the Makefile support composability like the one for llama.cpp. Probably copy-pasta most of what llama.cpp has so the build is using mpicc. Would still need to write the MPI code.
from llm.c.
definitely! but this is pretty far down the line, i think we first need to get the 1-GPU version to be super solid.
from llm.c.
I would do MPI-2 as MPI IO is all you need and it is most widely supported.
from llm.c.
Related Issues (20)
- Is there a plan to support 8bits (FP8 or INT8)? HOT 2
- compute sanitizers HOT 1
- Broader vendor support for hardware acceleration HOT 5
- 2D and 3D tile divisions so that permutation coordinates can be read from threadIdx and blockIdx HOT 3
- ThunderKittens Backend HOT 1
- Mismatch of dweight at layernorm_backward.cu
- Recalculating the activations in the backwards pass to conserve memory HOT 3
- Deleting Conda/Python as a dependency entirely to dramatically decrease "latency to step" HOT 4
- python dev/data/fineweb.py --version 10B HOT 2
- BitNet (b1.58) support HOT 2
- Cudnn error cudnn_att.cpp on train_gptcu HOT 4
- Model Export & Inference HOT 3
- Modal script - benchmarking, profiling and libraries HOT 4
- ERROR on the AMD GPU HOT 4
- apparent compatibility issues with earlier c++ versions after recent pushes HOT 3
- I can not understand the `cublasGemmStridedBatchedEx` call in the `attention_forward`
- LLM.c in google colab HOT 1
- Running `quick start on CPU` on Macbook Pro M2 HOT 7
- OSError: Memory mapping file failed: Cannot allocate memory HOT 1
- is max_seq_len configurable or hardcoded parameter? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llm.c.