Coder Social home page Coder Social logo

llmlens.c's Introduction

llmlens.c

This is a fork of Karpathy's llm.c in the spirit of Neel Nanda's TransformerLens. The current goal is to enable efficient, easy Sparse Autoencoder (SAE) training and feature in easy to read pure C/CUDA. Eventually, a webserver where a python notebook can make queries for inputs to visualize as features. I was motivated to make this because a lot of the current SAE training code is not very good and I think there may be a considerable amount of overhead in the way that current tools access the activations for training SAEs.

I plan to support single GPUs first and the multi GPUs secondarily.

quick start (GPU)

The "I don't care about anything I just want to train and I have a GPU" section. Run:

pip install -r requirements.txt
python prepro_tinyshakespeare.py
python train_gpt2.py
make train_gpt2fp32cu
./train_gpt2fp32cu

The above lines (1) download the tinyshakespeare dataset, tokenize it with the GPT-2 Tokenizer, (2) download and save the GPT-2 (124M) weights, (3) init from them in C/CUDA and train for one epoch on tineshakespeare with AdamW (using batch size 4, context length 1024, total of 74 steps), evaluate validation loss, and sample some text. Note that in this quickstart we are using the fp32 version train_gpt2_fp32.cu of the CUDA code. Below in the CUDA section we document the current "mainline" train_gpt2.cu, which is still being very actively developed, uses mixed precision, and runs ~2X faster.

quick start (multiple GPUs)

You'll be using the (more bleeding edge) mixed precision version of the code:

sudo apt install openmpi-bin openmpi-doc libopenmpi-dev
pip install -r requirements.txt
python prepro_tinyshakespeare.py
python train_gpt2.py
make train_gpt2cu
mpirun -np <number of GPUs on your machine> ./train_gpt2cu

license

MIT

llmlens.c's People

Contributors

karpathy avatar ngc92 avatar ademeure avatar lancerts avatar laserbear avatar dagelf avatar janeillario avatar ricardicus avatar rosslwheeler avatar chrisdryden avatar al0vya avatar peterzhizhin avatar yushengsu-thu avatar harryjackson avatar patricxu avatar nopperl avatar austinvhuang avatar azret avatar soldy avatar msharmavikram avatar ent0n29 avatar varunlakkur avatar tojen avatar scotthaleen avatar saimirbaci avatar onuralpszr avatar dorjeduck avatar zocterminal avatar vincigit00 avatar krrishnarraj avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.