This repository includes all scripts to plugin any open source LLM model and finetune it for custom downstream tasks.
- It uses Deep Speed based ZERO-Infinity Optimizations to fit models greater than GPU memory by offloading Gradients, Optimizers onto CPU memory and NVMe (SSD)
- Accelerate library to distribute the workloads onto multiple GPUs in a node
- PEFT LORA Adapters, merging them with pretrained base LLM models
- vLLM Inference Server which can handle continuous batching and produce 23x higher throughput than naive pipelines
Depending on what you are making, it can be a good idea to include screenshots or even a video (you'll frequently see GIFs rather than actual videos). Tools like ttygif can help, but check out Asciinema for a more sophisticated method.
Required packages to run these scripts are documented within the scripts
- Shreyas S K ([email protected])